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Preface 


This book has grown out of the author’s research work and teaching experience in the 
field of adaptive signal processing as well as signal processing applications to a variety of 
communication systems. The second edition of this book, while preserving the presentation 
of the basic theory of adaptive filters as in the first edition, expands significantly on a 
broad range of applications of adaptive filters. Six new chapters are added that look into 
various applications of adaptive filters. 

This book is designed to be used as a text to teach graduate-level courses in adaptive 
filters at different levels. It is also intended to serve as a technical reference for practicing 
engineers. 

A typical one-semester introductory course on adaptive filters may cover Chapters 1, 
3-6, and 12, and the first half of Chapter 11, in depth. Chapter 2, which contains a short 
review of the basic concepts of the discrete-time signals and systems, and some related 
concepts from random signal analyses, may be left as self-study material for students. 
Selected parts of the rest of this book may also be taught in the same semester, or, 
broader range of chapters may be used for a second semester course on advanced topics 
and applications. 

In the study of adaptive filters, computer simulations constitute an important supple- 
mental component to theoretical analyses and deductions. Often, theoretical developments 
and analyses involve a number of approximations and/or assumptions. Hence, computer 
simulations become necessary to confirm the theoretical results. Apart from this, com- 
puter simulation turns out to be a necessity in the study of adaptive filters for gaining an 
in-depth understanding of the behavior and properties of the various adaptive algorithms. 
MATLAB® from MathWorks Inc. appears to be the most commonly used software simula- 
tion package. Throughout this book, MATLAB® is used to present a number of simulation 
results to clarify and/or confirm the theoretical developments. The programs as well as 
data files used for generating these results can be downloaded from the accompanying 
website of this book at www.wiley.com/go/adaptive_filters 

Another integral part of this text is exercise problems at the end of chapters. With the 
exception of the first few chapters, two kinds of exercise problems are provided in each 
chapter: 


1. The usual problem exercises. These problems are designed to sharpen the readers’ 
skill in theoretical development. They are designed to extend results developed 
in the text and illustrate applications to practical problems. Solutions to these 
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problems are available to instructors on the companion website to the book: 
www.wiley.com/go/adaptive_filters 

2. Simulation-oriented problems. These involve computer simulations and are designed 
to enhance the readers’ understanding on the behavior of the different adaptive algo- 
rithms that are introduced in the text. Most of these problems are based on the 
MATLAB® programs, which are provided on the accompanying website. In addition, 
there are also other (open-ended) simulation-oriented problems, which are designed to 
help the readers to develop their own programs and prepare them to experiment with 
practical problems. 


The book assumes that the reader has some background of discrete-time signals and 
systems (including an introduction to linear system theory and random signal analysis), 
complex variable theory, and matrix algebra. However, a review of these topics is provided 
in Chapters 2 and 4. 

This book starts with a general overview of adaptive filters in Chapter 1. Many examples 
of applications such as system modeling, channel equalization, echo cancellation, and 
antenna arrays are reviewed in this chapter. This follows with a brief review of discrete- 
time signals and systems in Chapter 2, which puts the related concepts in a framework 
appropriate for the rest of this book. 

In Chapter 3, we introduce a class of optimum linear systems collectively known as 
Wiener filters. Wiener filters are fundamental to the implementation of adaptive filters. 
We note that the cost function used to formulate the Wiener filters is an elegant choice, 
leading to a mathematically tractable problem. We also discuss the unconstrained Wiener 
filters with respect to causality and duration of the filter impulse response. This study 
reveals many interesting aspects of Wiener filters and establishes a good foundation for 
the study of adaptive filters for the rest of this book. In particular, we find that, in the 
limit, when the filter length tends to infinity, a Wiener filter treats different frequency 
components of underlying processes separately. Numerical examples reveal that when the 
filter length is limited, separation of frequency components may be replaced by separation 
of frequency bands within a good approximation. This treatment of adaptive filters, which 
is pursued throughout this book, turns out to be an enlightening engineering approach for 
the study of adaptive filters. 

Eigenanalysis is an essential mathematical tool for the study of adaptive filters. A 
thorough treatment of this topic is covered in the first half of Chapter 4. The second 
half of this chapter gives an analysis of the performance surface of transversal Wiener 
filters. This is followed by search methods, which are introduced in Chapter 5. The search 
methods discussed in this chapter are idealized versions of the statistical search methods 
that are used in practice for actual implementation of adaptive filters. They are idealized in 
the sense that the statistics of the underlying processes are assumed to be known a priori. 

The celebrated least-mean-square (LMS) algorithm is introduced in Chapter 6 and 
extensively studied in Chapters 7-11. The LMS algorithm, which was first proposed 
by Widrow and Hoff in 1960’°s, is the most widely used adaptive filtering algorithm, in 
practice, owing to its simplicity and robustness to signal statistics. 

Chapters 12 and 13 are devoted to the method of least-squares. This discussion, although 
brief, gives the basic concept of the method of least-squares and highlights its advantages 
and disadvantages compared to the LMS-based algorithms. In Chapter 13, the reader is 
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introduced to the fast versions of least-squares algorithms. Overall, these two chapters lay 
a good foundation for the reader to continue his/her study of this subject with reference 
to more advanced books and/or papers. 

The problem of tracking is discussed in Chapter 14. In the context of a system modeling 
problem, we present a generalized formulation of the LMS algorithm, which covers most 
of the algorithms that are discussed in the previous chapters of this book, thus bringing a 
common platform for comparison of different algorithms. We also discuss how the step- 
size parameter(s) of the LMS algorithm and the forgetting factor of the RLS algorithm 
may be optimized for achieving good tracking behavior. 

Chapters 15-20 cover a range of applications where the theoretical results of the 
previous chapters are applied to a wide range of practical problems. Chapter 15 presents 
a number of practical problems related to echo cancelers. The chapter emphasis is on 
acoustic echo cancellation that is encountered in teleconferencing applications. In such 
applications, one has to deal with specific problems that do not fall into the domain of 
the traditional theory of adaptive filters. For instance, when both parties at the two sides 
of the conferencing line talk simultaneously, their respective signals interfere with one 
another and hence, the adaptation of both echo cancelers on the two sides of the line may 
be disrupted. Therefore, specific double-talk detection methods should be designed. The 
stereophonic acoustic echo cancelers that have gained some momentums in recent years 
are also discussed in detail. 

Chapter 16 presents and discusses the underlying problems related to active noise 
cancellation control, which are also somewhat different from the traditional adaptive 
filtering problems. 

Chapters 17 is devoted to the issues related to synchronization and channel equalization 
in communication systems. Although many fundamentals of the classical adaptive filters 
theory have been developed in the context of channel equalization, there are a number 
of specific issues in the domain of communication systems that can be only presented as 
new concepts, which may be thought of as extensions of the classical theory of adaptive 
filters. Many such extensions are presented in this chapter. 

Sensor array processing and code division multiple access (CDMA) are two areas where 
adaptive filters have been used extensively. Although these seem to be two very different 
applications, there are a number of similarities that if understood allows one to use results 
of one application for the other as well. As sensor array processing has been developed 
well ahead of CDMA, we follow this historical development and present sensor array 
processing techniques in Chapter 18, followed by a presentation of CDMA theory and 
the relevant algorithms in Chapter 19. 

The recent advancement in adaptive filters as applied to the design and implementation 
multicarrier systems (respectively, orthogonal frequency division multiplexing- OFDM) 
and communication systems with multiple antennas at both transmitter and receiver sides 
(known as multi-input multi-output-MIMO) are discussed in Chapter 20. This chapter 
explains some practical issues related to these modern signal processing techniques and 
presents a few solutions that have been adopted in the current standards, e.g., WiFi, 
WiMax, and LTE. 

The following notations are adopted in this book. We use nonbold lowercase letters for 
scalar quantities, bold lowercase for vectors, and bold uppercase for matrices. Nonbold 
uppercase letters are used for functions of variables, such as H(z), and lengths/dimensions 
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of vectors/matrices. The lowercase letter “n” is used for the time index. In the case of 
block processing algorithms, such as those discussed in Chapters 8 and 9, we reserve the 
lowercase letter “k” as the block index. The time and block indices are put in brackets, 
while subscripts are used to refer to elements of vectors and matrices. For example, the 
ith element of the time-varying tap-weight vector w(n) is denoted as w;(n). The super- 
scripts “T” and “H” denote vector or matrix transposition and Hermitian transposition, 
respectively. We keep all vectors in column form. More specific notations are explained 
in the text as and when found necessary. 
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Introduction 


As we begin our study of “adaptive filters,” it may be worth trying to understand the 
meaning of the terms adaptive and filters in a very general sense. The adjective “adaptive” 
can be understood by considering a system that is trying to adjust itself so as to respond 
to some phenomenon that is taking place in its surroundings. In other words, the system 
tries to adjust its parameters with the aim of meeting some well-defined goal or target that 
depends on the state of the system as well as its surrounding. This is what “adaptation” 
means. Moreover, there is a need to have a set of steps or certain procedure by which 
this process of “adaptation” is carried out. And finally, the “system” that carries out and 
undergoes the process of “adaptation” is called by the more technical, yet general enough, 
name “filter” — a term that is very familiar to and a favorite of any engineer. Clearly, 
depending on the time required to meet the final target of the adaptation process, which 
we call convergence time, and the complexity/resources that are available to carry out the 
adaptation, we can have a variety of adaptation algorithms and filter structures. From this 
point of view, we may summarize the contents/contribution of this book as “the study of 
some selected adaptive algorithms and their implementations along with the associated 
filter structures from the points of view of their convergence and complexity performance.” 


1.1 Linear Filters 


The term filter is commonly used to refer to any device or system that takes a mixture of 
particles/elements from its input and processes them according to some specific rules to 
generate a corresponding set of particles/elements at its output. In the context of signals 
and systems, particles/elements are the frequency components of the underlying signals 
and, traditionally, filters are used to retain all the frequency components that belong to a 
particular band of frequencies, while rejecting the rest of them, as much as possible. In a 
more general sense, the term filter may be used to refer to a system that reshapes the fre- 
quency components of the input to generate an output signal with some desirable features, 
and this is how we view the concept of filtering throughout the chapters which follow. 
Filters (or systems, in general) may be either linear or nonlinear. In this book, we 
consider only linear filters and our emphasis will also be on discrete-time signals and sys- 
tems. Thus, all the signals will be represented by sequences, such as x(n). The most basic 
feature of linear systems is that their behavior is governed by the principle of superposi- 
tion. This means that if the responses of a linear discrete-time system to input sequences 
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Figure 1.1 Schematic diagram of a filter emphasizing its role of reshaping the input signal to 
match the desired signal. 


x(n) and x(n) are yı (n) and y(n), respectively, the response of the same system to the 
input sequence x(n) = axı (n) + bx,(n), where a and b are arbitrary constants, will be 
y(n) = ayı (n) + byz(n). This property leads to many interesting results in “linear system 
theory.” In particular, a linear system is completely characterized by its impulse response 
or the Fourier transform of its impulse response known as transfer function. The transfer 
function of a system at any frequency is equal to its gain at that frequency. In other words, 
in the context of our discussion above, we may say that the transfer function of a system 
determines how the various frequency components of its input are reshaped by the system. 

Figure 1.1 depicts a general schematic diagram of a filter emphasizing the purpose for 
which it is used in different problems addressed/discussed in this book. In particular, the 
filter is used to reshape a certain input signal in such a way that its output is a good estimate 
of the given desired signal. The process of selecting the filter parameters (coefficients) so 
as to achieve the best match between the desired signal and the filter output is often done 
by optimizing an appropriately defined performance function. The performance function 
can be defined in a statistical or deterministic framework. In the statistical approach, 
the most commonly used performance function is the mean-squared value of the error 
signal, that is, difference between the desired signal and the filter output. For stationary 
input and desired signals, minimizing the mean squared error (MSE) results in the well- 
known Wiener filter, which is said to be optimum in the mean-square sense. The subject 
of Wiener filters is extensively covered in Chapter 3. Most of the adaptive algorithms 
that are studied in this book are practical solutions to Wiener filters. In the deterministic 
approach, the usual choice of performance function is a weighted sum of the squared error 
signal. Minimizing this function results in a filter that is optimum for the given set of 
data. However, under some assumptions on certain statistical properties of the data, the 
deterministic solution will approach the statistical solution, that is, the Wiener filter, for 
large data lengths. Chapters 12 and 13 deal with the deterministic approach in detail. We 
refer the reader to Section 1.4 for a brief overview of the adaptive formulations under the 
stochastic (i.e., statistical) and deterministic frameworks. 


1.2 Adaptive Filters 


As we mentioned in the previous section, the filter required for estimating the given desired 
signal can be designed using either the stochastic or the deterministic formulations. In 
the deterministic formulation, the filter design requires the computation of certain average 
quantities using the given set of data that the filter should process. On the other hand, the 
design of Wiener filter (i.e., in the stochastic approach) requires a priori knowledge of 
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the statistics of the underlying signals. Strictly speaking, a large number of realizations of 
the underlying signal sequences are required for reliably estimating these statistics. This 
procedure is practically not feasible because we usually have only one realization for each 
of the signal sequences. To resolve this problem, it is assumed that the underlying signal 
sequences are ergodic, which means that they are stationary and their statistical and time 
averages are identical. Thus, using the time averages, Wiener filters can be designed, even 
though there is only one realization for each of the signal sequences. 

Although, direct measurement of the signal averages to obtain the necessary information 
for the design of Wiener or other optimum filters is possible, in most of the applications, 
the signal averages (statistics) are used in an indirect manner. All the algorithms that are 
covered in this book take the output error of the filter, correlate that with the samples 
of filter input in some way, and use the result in a recursive equation to adjust the filter 
coefficients iteratively. The reasons for solving the problem of adaptive filtering in an 
iterative manner are as follows: 


1. Direct computation of the necessary averages and their application for computing the 
filter coefficients requires accumulation of a large amount of signal samples. Iterative 
solutions, on the other hand, do not require accumulation of signal samples, thereby 
resulting in a significant amount of saving in memory. 

2. Accumulation of signal samples and their postprocessing to generate the filter output, 
as required in noniterative solutions, introduces a large delay in the filter output. This is 
unacceptable in many applications. Iterative solutions, on the contrary, do not introduce 
any significant delay in the filter output. 

3. The use of iterations results in adaptive solutions with some tracking capability. That 
is, if the signal statistics are changing with time, the solution provided by an iterative 
adjustment of the filter coefficients will be able to adapt to the new statistics. 

4. Iterative solutions, in general, are much simpler to code in software or implement in 
hardware than their noniterative counterparts. 


1.3 Adaptive Filter Structures 


The most commonly used structure in the implementation of adaptive filters is the 
transversal structure, depicted in Figure 1.2. Here, the adaptive filter has a single input, 
x(n), and an output, y(n). The sequence d(n) is the desired signal. The output, y(n), is 
generated as a linear combination of the delayed samples of the input sequence, x(n), 
according to Equation (1.1) 


N-1 


y(n) = > w;(n)x(n — i) (1.1) 


i=0 


where w,(n)’s are the filter tap weights (coefficients) and N is the filter length. We refer 
to the input samples, x(n — i), for i = 0, 1,..., N — 1, as the filter tap inputs. The tap 
weights, w;(n)’s, which may vary with time, are controlled by the adaptation algorithm. 

In some applications, such as beamforming (Section 1.6.4), the filter tap inputs are not 
the delayed samples of a single input. In such cases, the structure of the adaptive filter 
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Figure 1.2 Adaptive transversal filter. 
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Figure 1.3 Adaptive linear combiner. 


assumes the form shown in Figure 1.3. This is called a linear combiner as its output is a 
linear combination of the different signals received at its tap inputs: 
N-1 
y(n) = D> winx n) (1.2) 
i=0 


Note that the linear combiner structure is more general than the transversal. The latter, as 
a special case of the former, can be obtained by choosing x;(n) = x(n — i). 
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The structures of Figures 1.2 and 1.3 are those of the nonrecursive filters, that is, 
computation of filter output does not involve any feedback mechanism. We also refer to 
Figure 1.2 as a finite-impulse response (FIR) filter as its impulse response is of finite dura- 
tion in time. An infinite-impulse response (IIR) filter is governed by recursive equations 
such as (Figure 1.4) 


N-1 M-1 
y(n) = Do aj(n)x(n — i) + D> b(n) - i) (1.3) 
i=0 i=1 


where a;(n) and b;(n) are the forward and feedback tap weights, respectively. IIR filters 
have been used in many applications. However, as we shall see in the later chapters, 
because of the many difficulties involved in the adaptation of IIR filters, their application 
in the area of adaptive filters is rather limited. In particular, they can easily become 
unstable because their poles may get shifted out of the unit circle (i.e., |z| = 1, in the 
z-plane, Chapter 2) by the adaptation process. Moreover, the performance function (e.g., 
MSE as a function of filter coefficients) of an IIR filter usually has many local minima 
points. This may result in convergence of the filter to one of the local minima and not to 
the desired global minimum point of the performance function. On the contrary, the MSE 
functions of FIR filter and linear combiner are well-behaved quadratic functions with a 
single minimum point, which can easily be found through various adaptive algorithms. 
Because of these points, the nonrecursive filters are the sole candidates in most of the 
applications of adaptive filters. Hence, most of our discussions in the subsequent chapters 
are limited to the nonrecursive filters. The HR-adaptive filters with two specific examples 
of their applications are discussed in Chapter 10. 

The FIR and IIR structures shown in Figures 1.2 and 1.4 are obtained by direct realiza- 
tion of the respective difference equations (1.1) and (1.3). These filters may alternatively 
be implemented using the lattice structures. The lattice structures, in general, are more 
complicated than the direct implementations. However, in certain applications, they have 
some advantages which make them better candidates than the direct forms. For instance, in 
the application of linear prediction for speech processing where we need to realize all-pole 
(IIR) filters, the lattice structure can be more easily controlled to prevent possible instabil- 
ity of the filter. Derivation of lattice structures for both FIR and IIR filters are presented in 
Chapter 11. Also, in the implementation of the method of least-squares (Section 1.4.2), the 
use of lattice structure leads to a computationally efficient algorithm known as recursive 
least-squares (RLS) lattice. A derivation of this algorithm is presented in Chapter 13. 

The FIR and IIR filters which were discussed above are classified as linear filters 
because their outputs are obtained as linear combinations of the present and past samples 
of input and, in the case of IIR filter, the past samples of the output also. Although most 
applications are restricted to the use of linear filters, nonlinear adaptive filters become 
necessary in some applications where the underlying physical phenomena to be modeled 
are far from being linear. A typical example is magnetic recording where the recording 
channel becomes nonlinear at high densities because of the interaction among the magne- 
tization transitions written on the medium. The Volterra series representation of systems 
is usually used in such applications. The output, y(n), of a Volterra system is related to 
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Figure 1.4 The structure of an IIR filter. 


its input, x(n), according to the equation 


y(n) = woo) + X wi x — i) 


+ X wyja —i)x(n — j) 


ij 


$ > ws; xn — D)x(n — x(n k) +... (1.4) 
i,j,k 
where Wo 9(7), wi; (nys, w2; ;(1)S, Wi; j (nys, ... are filter coefficients. In this book, 


we do not discuss the Volterra filters any further. However, we note that all the summa- 
tions in Eq. (1.4) may be put together and the Volterra filter may be thought of as a linear 
combiner whose inputs are determined by the delayed samples of x(n) and their cross- 
multiplications. Noting this, we find that the extension of most of the adaptive filtering 
algorithms to the Volterra filters is straightforward. 
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1.4 Adaptation Approaches 


As introduced in Sections 1.1 and 1.2, there are two distinct approaches that have 
been widely used in the development of various adaptive algorithms, viz. stochastic and 
deterministic. Both these approaches have many variations in their implementations lead- 
ing to a rich variety of algorithms; each of which offers desirable features of its own. In 
this section, we present a review of these two approaches and highlight the main features 
of the related algorithms. 


1.4.1 Approach Based on Wiener Filter Theory 


According to the Wiener filter theory, which comes from the stochastic framework, 
the optimum coefficients of a linear filter is obtained by minimization of its MSE. As 
was noted before, strictly speaking, the minimization of MSE requires certain statistics 
obtained through ensemble averaging, which may not be possible in practical applications. 
The problem is resolved using ergodicity so as to use time averages instead of ensemble 
averages. Furthermore, to come up with simple recursive algorithms, very rough estimates 
of the required statistics are used. In fact, the celebrated Jeast-mean square (LMS) algo- 
rithm, which is the most basic and widely used algorithm in various adaptive filtering 
applications, uses the instantaneous value of the square of the error signal as an estimate 
of the MSE. It turns out that this very rough estimate of the MSE, when used with a 
small step-size parameter in searching for the optimum coefficients of the Wiener filter, 
leads to a very simple and yet reliable adaptive algorithm. 

The main disadvantage of the LMS algorithm is that its convergence behavior is highly 
dependent on the power spectral density of the filter input. When the filter input is white, 
that is, its power spectrum is flat across the whole range of frequencies, the LMS algorithm 
converges very fast. However, when certain bands of frequencies are not well excited 
(i.e., the signal energy in those bands is relatively low), some slow modes of convergence 
appear, thus resulting in very slow convergence compared to the case of white input. 
In other words, to converge fast, the LMS algorithm requires equal excitation over the 
whole range of frequencies. Noting this, over the years, researchers have developed many 
algorithms that effectively divide the frequency band of the input signal into a number of 
subbands and achieve some degree of signal whitening using some power normalization 
mechanism before applying the adaptive algorithm. These algorithms which appear in 
different forms are presented in Chapters 7, 9, and 11. 

In some applications, we need to use adaptive filters whose length exceeds a few hun- 
dreds or even a few thousands of taps. Clearly, such filters are computationally expensive 
to implement. An effective way of implementing such filters at a much lower computa- 
tional complexity is to use the fast Fourier transform (FFT) algorithm to implement the 
time domain convolutions in the frequency domain, as is commonly done in the implemen- 
tation of long digital filters (Oppenheim and Schafer, 1975, 1989). Adaptive algorithms 
which use FFT for reducing computational complexity are presented in Chapter 8. 
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1.4.2 Method of Least-Squares 


The adaptive filtering algorithms whose derivations are based on the Wiener filter theory 
have their origin in a statistical formulation of the problem. In contrast to this, the method 
of least-squares approaches the problem of filter optimization from a deterministic point 
of view. As mentioned before, in the Wiener filter theory, the desired filter is obtained by 
minimizing the MSE, that is, a statistical quantity. In the method of least-squares, on the 
other hand, the performance index is the sum of weighted error squares for the given data, 
that is, a deterministic quantity. A consequence of this deterministic approach (that will 
become clear as we go through its derivation in Chapter 12) is that the least-squares-based 
algorithms, in general, converge much faster than the LMS-based algorithms. They are 
also insensitive to the power spectral density of the input signal. The price that is paid 
for achieving this improved convergence performance is higher computational complexity 
and poorer numerical stability. 

Direct formulation of the least-squares problem results in a matrix formulation of its 
solution which can be applied on block-by-block basis to the incoming signals. This, which 
is referred to as block estimation of the least-squares method, has some useful applications 
in areas such as linear predictive coding (LPC) of speech signals. However, in the context 
of adaptive filters, recursive formulations of the least-squares method that update the filter 
coefficients after the arrival of every sample of input are preferred because of the reasons 
that were given in Section 1.2. There are three major classes of RLS adaptive filtering 
algorithms and are as follows: 


Standard RLS algorithm. The derivation of this algorithm involves the use of a well- 
known result from linear algebra known as the matrix inversion lemma. Consequently, 
the implementation of the standard RLS algorithm involves matrix manipulations that 
result in a computational complexity proportional to the square of the filter length. 

QR-decomposition-based RLS (QRD-RLS) algorithm. This formulation of RLS algorithm 
also involves matrix manipulations, which leads to a computational complexity that 
grows with the square of the filter length. However, the operations involved here are 
such that they can be put into some regular structures known as systolic arrays. Another 
important feature of the QRD-RLS algorithm is its robustness to numerical errors 
compared to other types of RLS algorithms (Haykin, 1991, 1996) 

Fast RLS algorithms. In the case of transversal filters, the tap inputs are successive samples 
of input signal, x(n) (Figure 1.1). The fast RLS algorithms use this property of the 
filter input and solve the problem of least-squares with a computational complexity, 
which is proportional to the length of the filter, thus the name fast RLS. Two types of 
fast RLS algorithms may be recognized: 

RLS lattice algorithms. These lattice algorithms involve the use of order-update as well 
as the time-update equations. A consequence of this feature is that it results in mod- 
ular structures, which are suitable for hardware implementations using the pipelining 
technique. Another desirable feature of these algorithms is that certain variants of them 
are very robust against numerical errors arising from the use of finite word lengths 
in computations. 

Fast transversal RLS algorithm. In terms of number of operations per iteration, the fast 
transversal RLS algorithm is less complex than the lattice RLS algorithms. However, it 
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suffers from numerical instability problems that require careful attention to prevent 
undesirable behavior in practice. 


In this book, we present a complete treatment of the various LMS-based algorithms 
in seven chapters. However, our discussion on RLS algorithms is rather limited. We 
present a comprehensive treatment of the properties of the method of least-squares and a 
derivation of the standard RLS algorithm in Chapter 12. The basic results related to the 
development of fast RLS algorithms and some examples of such algorithms are presented 
in Chapter 13. A study of the tracking behavior of selected adaptive filtering algorithms is 
presented in Chapter 14 of this book. The use of these algorithms to various applications 
are discussed in Chapters 15 through 20. 


1.5 Real and Complex Forms of Adaptive Filters 


There are some practical applications in which the filter input and its desired signal 
are complex-valued. A good example of this situation appears in digital data transmis- 
sion, where the most widely used signaling techniques are phase shift keying (PSK) and 
quadrature-amplitude modulation (QAM). In this application, the baseband signal consists 
of two separate components, which are the real and imaginary parts of a complex-valued 
signal. Moreover, in the case of frequency domain implementation of adaptive filters 
(Chapter 8) and subband adaptive filters (Chapter 9), we will be dealing with complex- 
valued signals, even though the original signals may be real-valued. Thus, we find cases 
where the formulation of the adaptive filtering algorithms must be given in terms of 
complex-valued variables. 

In this book, to keep our presentation as simple as possible, most of the derivations are 
given for real-valued signals. However, wherever we find it necessary, the extensions to 
complex forms will also be followed. 


1.6 Applications 


Adaptive filters by their very nature are self-designing systems that can adjust themselves 
to different environments. As a result, adaptive filters find applications in such diverse 
fields as control, communications, radar and sonar signal processing, interference can- 
cellation, active noise control (ANC), biomedical engineering, and so on. The common 
feature of these applications that brings them under the same basic formulation of adaptive 
filtering is that they all involve a process of filtering some input signal to match a desired 
response. The filter parameters are updated by making a set of measurements of the under- 
lying signals and applying that to the adaptive filtering algorithm such that the difference 
between the filter output and the desired response is minimized in either statistical or 
deterministic sense. In this context, four basic classes of adaptive filtering applications 
are recognized. Namely, modeling, inverse modeling, linear prediction, and interference 
cancellation. In the rest of this chapter, we present an overview of these applications. 


1.6.1 Modeling 


Figure 1.5 depicts the problem of modeling in the context of adaptive filters. The aim is 
to estimate the parameters of the model, W(z), of a plant, G(z). On the basis of some 
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Figure 1.6 Block diagram of a self-tuning regulator. 


a priori knowledge of the plant, G(z), a transfer function, W(z), with certain number 
of adjustable parameters is selected first. The parameters of W(z) are then chosen by an 
adaptive filtering algorithm such that the difference between the plant output, d(n), and 
the adaptive filter output, y(n), is minimized. 

An application of modeling, which may be readily thought of, is system identification. 
In most modern control systems, the plant under control is identified on-line and the result 
is used in a self-tuning regulator (STR) loop, as depicted in Figure 1.6 (see e.g., Astrom 
and Wittenmark (1980)). 

Another application of modeling is echo cancellation. In this application, an adaptive 
filter is used to identify the impulse response of the path between the source from which 
the echo originates and the point where the echo appears. The output of the adaptive 
filter, which is an estimate of the echo signal, can then be used to cancel the unde- 
sirable echo. The subject of echo cancellation is discussed further under the topic of 
interference cancellation. 

Nonideal characteristics of communication channels often result in some distortion in 
the received signals. To mitigate such distortion, channel equalizers are usually used. 
This technique, which is equivalent to implementing the inverse of the channel response, 


Introduction 11 


s(n) 


——— channel 


detector 


channel 


decision 
parameters 


directed 


channel 
model 


training 


training 
sequence 


Figure 1.7 An adaptive data receiver using channel identification. 


is discussed in the following under the topic of inverse modeling. Direct modeling of the 
channel, however, has also been found useful in some implementations of data receivers. 
For instance, data receivers equipped with maximum-likelihood detectors require an esti- 
mate of the channel response (Proakis, 1995). Furthermore, computation of equalizer 
coefficients from channel response has been proposed by some researchers because this 
technique has been found to result in better tracking of time-varying channels (Fechtel and 
Meyr (1991) and Farhang-Boroujeny and Wang (1995)). In such applications, a training 
pattern is transmitted in the beginning of every connection. The received signal, which 
acts as the desired signal to an adaptive filter, is used in a setup, as shown in Figure 1.7 to 
identify the channel. Once the channel is identified and the normal mode of transmission 
begins, the detected data symbols, $(n), are used as input to the channel model and the 
adaptation process continues for tracking possible variations of the channel. This is known 
as decision-directed mode and is also shown in Figure 1.7. 


1.6.2 Inverse Modeling 


Inverse modeling, also known as deconvolution, is another application of adaptive filters 
that has found extensive use in various engineering disciplines. The most widely used 
application of inverse modeling is in communications where an inverse model (also called 
equalizer) is used to mitigate the channel distortion. The concept of inverse modeling has 
also been applied to adaptive control systems where a controller is to be designed and 
cascaded with a plant so that the overall response of this cascade matches a desired (target) 
response (Widrow and Stearns, 1985). The process of prediction, which is explained 
later, may also be viewed as an inverse modeling scheme (Section 1.6.3). In this section, 
we concentrate on the application of inverse modeling in channel equalization. The full 
treatment of the subject of channel equalization is presented in Chapter 17. 
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Figure 1.8 A baseband data transmission system with channel equalizer. 


Channel Equalization 


Figure 1.8 depicts the block diagram of a baseband transmission system equipped with a 
channel equalizer. Here, the channel represents the combined response of the transmitter 
filter, the actual channel, and the receiver front-end filter. The additive noise sequence, 
v(n), arises from thermal noise in the electronic circuits and possible crosstalks from 
neighboring channels. The transmitted data symbols, s(n), which appear in the form of 
amplitude/phase modulated pulses, are distorted by the channel. The most significant 
among the different distortions is the pulse-spreading effect, which results because the 
channel impulse response is not equal to an ideal impulse function, and instead a response 
which is nonzero over many symbol periods. This distortion results in interference of 
neighboring data symbols with one another, thereby making the detection process through 
a simple threshold detector unreliable. The phenomenon of interference among neighbor- 
ing data symbols is known as intersymbol interference (ISI). The presence of the additive 
noise samples, v(7), further deteriorates the performance of data receivers. The role of the 
equalizer, as a filter, is to resolve the distortion introduced by the channel (i.e., rejection or 
minimization of ISI), while minimizing the effect of additive noise at the threshold detec- 
tor input (equalizer output) as much as possible. If the additive noise could be ignored, the 
task of equalizer would be rather straightforward. For a channel H(z), an equalizer with 
transfer function W (z) = 1/H (z) could do the job perfectly as this results in an overall 
channel equalizer transfer function H (z)W (z) = 1, which implies that the transmitted data 
sequence, s(n), will appear at the detector input without any distortion. Unfortunately, 
this is an ideal situation which cannot be used in most of the practical applications. 

We note that the inverse of the channel transfer function, that is, 1/H(z), may be non- 
causal if H(z) happens to have a zero outside the unit circle, thus making it unrealizable in 
practice. This problem is solved by selecting the equalizer so that H(z)W(z) ~ z~4, where 
A is an appropriate integer delay. This is equivalent to saying that a delayed replica of the 
transmitted symbols appears at the equalizer output. Example 3.4 of Chapter 3 clarifies 
the concept of noncausality of 1/H(z) and also the way the problem is (approximately) 
solved by introducing a delay, A. Greater details appear in Chapter 17. 

We also note that the choice of W (z) = 1/H (z) (or W(z) © z~4/H(z)) may lead to a 
significant enhancement of the additive noise, v(m), in those frequency bands where the 
magnitude of H(z) is small (i.e., 1/H (z) is large). Hence, in choosing an equalizer, W (z), 
one should keep a balance between residual ISI and noise enhancement at the equalizer 
output. Wiener filter is a solution with such a balance (Chapter 3, Section 3.6.4). 
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Figure 1.9 Details of a baseband data transmission system equipped with an adaptive channel 
equalizer. 


Figure 1.9 presents the details of a baseband transmission system, equipped with an 
adaptive equalizer. The equalizer is usually implemented in the form of a transversal 
filter. Initial training of the equalizer requires knowledge of the transmitted data symbols 
as they (to be more accurate, a delayed replica of them) should be used as the desired 
signal samples for adaptation of the equalizer tap weights. This follows from the fact 
that the equalizer output should ideally be the same as the transmitted data symbols. We 
thus require an initialization period during which the transmitter sends a sequence of 
training symbols that are known to the receiver. This is called the training mode. Training 
symbols are usually specified as part of the standards, and the manufacturers of data 
modems! should comply with these so that the modems of different manufacturers can 
communicate with one another. 

At the end of the training mode, the tap weights of the equalizer would have converged 
close to their optimal values. The detected symbols would then be similar to the trans- 
mitted symbols with a probability close to 1. Hence, then onward, the detected symbols 
can be treated as the desired signal for further adaptation of the equalizer so that possible 
variations of the channel can be tracked. This mode of operation of the equalizer is called 
the decision-directed mode. The decision-directed mode successfully works as long as the 
channel variation is slow enough so that the adaptation algorithm is able to follow the 
channel variations satisfactorily. This is necessary for the purpose of ensuring low-symbol 
error rates in detection so that these symbols can still be used as the desired signal. 

The inverse modeling discussed previously defines the equalizer as an approximation 
of z-4/H(z), that is, the target/desired response of the cascade of channel and equalizer 
is z-4, a pure delay. This can be generalized by replacing the target response z~“ by 
a general target response, say I (z). In fact, to achieve higher efficiency in the usage of 
the available bandwidth, some special choices of T(z) 4 z~“ are usually considered in 
communication systems. Systems which incorporate such nontrivial target responses are 
referred to as partial-response signaling systems. The detector in such systems is no more 
the simple threshold detector, but one which can exploit the information that the overall 
channel is now T (z), instead of the trivial memoryless channel z~4. The Viterbi detector 


! The term modem which is the abbreviation for “modulator and demodulator” is commonly used to refer data 
transceivers (transmitter and receiver). 
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(Proakis, 1995) is an example for such a detector. The target response, F(z), is selected 
so that its magnitude response approximately matches the channel response, that is, 
|r (e/”)| ~ |H(e/”)|, over the range of frequencies of interest. The impact of this choice 
is that the equalizer, which is now W (z) © I'(z)/H(z), has a magnitude response that 
is approximately equal to 1, thereby minimizing the noise enhancement. To clarify fur- 
ther on this and also to mention another application of inverse modeling, we discuss the 
problem of magnetic recording next. 


Magnetic Recording 


The process of writing data bits on a magnetic medium (tape or disk) and reading them 
back later is similar to sending data bits over a communication channel from one end 
of a transmission line and receiving them at the other side of the line. The data bits, 
which are converted to signal pulses before recording, undergo some distortion because 
of nonperfect behavior of the head and medium, as it happens in communication channels 
because of the nonideal response of the channel. Additive thermal noise and interference 
from neighboring recording tracks (just like neighboring channels in communications) are 
also present in the magnetic recording channels (Bergmans, 1996). 

Magnetic recording channels are usually characterized by their response to an isolated 
pulse of width 1-bit interval, T. This is known as dibit response and in the case of 
hard-disk channels, it is usually modeled by the superposition of a positive and negative 
Lorentzian pulses, separated by 1-bit interval, T. In other words, the Lorentzian pulse 
models the step response of the channel. The Lorentzian pulse is defined as 


(1.5) 


where fs, is the pulse width measured at 50% of its maximum amplitude. The subscript 
“a” in g,(t) and other functions that appear in the rest of this subsection are to emphasize 
that they are analog (nonsampled) signals. The ratio D = t5,/T is known as the recording 
density. Typical values of D are in the range of | to 3. A higher density means more bits 
are contained in one fs, interval, that is, more ISI. We may also note that t; is a temporal 
measure of the recording density. When measured spatially, we obtain another parameter 
PWs9 = ts,/v, where v is the velocity of the medium with respect to head. Accordingly, 
for a given speed, v, the value of D specifies the actual number of bits written on a length 
PWso along the track on the magnetic medium. 
Using Eq. (1.5), the dibit response of a hard-disk channel is obtained as 


h t) = 86) — 8t — T) (1.6) 


The response of the channel to a sequence s(n) of data bits is then given by the convo- 
lution sum 
u,(t) = > s(n)h,(t — nT) (1.7) 
n 
Thus, the dibit response, h,(t), is nothing but the impulse response of the 
recording channel. 
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Figure 1.10a and b shows the dibit (time domain) and magnitude (frequency domain) 
responses, respectively, of the magnetic channels (based on the Lorentzian model) for 
densities D = 1, 2, and 3. From Figure 1.10b, we note that most of the energy in the 
read-back signals is concentrated in a midband range between zero and an upper limit 
around 1/27. Clearly, the bandwidth increases with increase in density. In the light of 
our previous discussions, we may thus choose the target response, T (z), of the equalizer 
so that it resembles a bandpass filter whose bandwidth and magnitude response are close 
to those of the Lorentzian dibit responses. In magnetic recording, the most commonly 
used partial responses (1.e., target responses) are given by the class-IV response 


r) =z 41+) A-z!) (1.8) 


where A, as before, is an integer delay and K is an integer greater than or equal to 1. 
As the recording density increases, higher values of K will be required to match the 
channel characteristics. But, as K increases, the channel length also increases, implying 
higher complexity in the detector. In Chapter 10, we elaborate on these aspects of 
partial-response systems. 


1.6.3 Linear Prediction 


Prediction is a spectral estimation technique that is used for modeling correlated random 
processes for the purpose of finding a parametric representation of these processes. In 
general, different parametric representations could be used to model the processes. In 
the context of linear prediction, the model used is shown in Figure 1.11. Here, the ran- 
dom process, ¥(n), is assumed to be generated by exciting the filter G(z) with the input 
u(n). As G(z) is an all-pole filter, this is known as autoregressive (AR) modeling. The 
choice/type of the excitation signal, u(n), is application dependent and may vary depend- 
ing on the nature of the process being modeled. However, it is usually chosen to be a 
white process. 

Other models used for parametric representation are moving average (MA) models, 
where G (z) is an all-zero (transversal) filter, and autoregressive-moving average (ARMA) 
models, where G(z) has both poles and zeros. However, the use of AR model is more 
popular than other two. 

The rationale behind the use of AR modeling may be explained as follows. As the 
samples of any given nonwhite random signal, x(n), are correlated with one another, 
these correlations could be used to make a prediction of the present sample of the pro- 
cess, x(n), in terms of its past samples, x(n — 1), x(n — 2), ..., x(n — N), as shown 
in Figure 1.12. Intuitively, such prediction improves as the predictor length increases. 
However, the improvement obtained may become negligible once the predictor length, 
N, exceeds certain value, which depends on the extent of correlation in the given pro- 
cess. The prediction error, e(n), will then be approximately white. We now note that the 
transfer function between the input process, x(n), and the prediction error, e(n), is 


N 
H(z) =1- oa (1.9) 
i=l 
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Figure 1.10 Time and frequency domain responses of magnetic recording channels for densities 
D = 1, 2, and 3 modeled using the Lorentzian pulse: (a) dibit response; (b) magnitude response of 
dibit response. 
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Figure 1.11 Autoregressive modeling of a random process. 


where a,;’s are the predictor coefficients. Now, if a white process, u(n), with similar 
statistics as e(n) is passed through an all-pole filter with the transfer function 


1 
N 


1— J az~ 


i=1 


G(z) = (1.10) 


as shown in Figure 1.11, the generated output, x(n), will clearly be a process with the 
same statistics as x(n). 

With the background developed above, we are now ready to discuss a few applications 
of adaptive prediction. 


Autoregressive Spectral Analysis 


In certain applications, we need to estimate the power spectrum of a random process. A 
trivial way of obtaining such estimate is to take the Fourier transform (discrete Fourier 
transform (DFT) in the case of discrete-time processes) and use some averaging (smooth- 
ing) technique to improve the estimate. This comes under the class of nonparametric 
spectral estimation techniques (Kay, 1988). When the number of samples of the input 
is limited, the estimates provided by nonparametric spectral estimation techniques will 
become unreliable. In such cases, the parametric spectral estimation, as explained above, 
may give more reliable estimates. 

As mentioned already, parametric spectral estimation could be done using either AR, 
MA, or ARMA models (Kay, 1988). In the case of AR modeling, we proceed as fol- 
lows. We first choose a proper order, N, for the model. The observed sequence, x(n), is 
then applied to a predictor structure similar to Figure 1.12 whose coefficients, a;’s, are 
optimized by minimizing the prediction error, e(n). Once the predictor coefficients have 


x(n) 


Figure 1.12 Linear predictor. 


18 Adaptive Filters 


converged, an estimate of the power spectral density of x(n) is obtained according to the 


following equation: ; 


; 1 
®,.(e/°) = N, (1.11) 


N . . 
1— Y aqe-joi 
i=1 


where N, is an estimate of the power of the prediction error, e(n). This follows from 
the model of Figure 1.11 and the fact that after the convergence of the predictor, e(n) 
is approximately white. For further explanation on the derivation of Eq. (1.11) from the 
signal model of Figure 1.11, refer to Chapter 2 (Section 2.4.4). 


Adaptive Line Enhancement 


Adaptive line enhancement refers to the situation where a narrow-band signal embedded 
in a wide-band signal (usually, white) needs to be extracted. Depending on the application, 
the extracted signal may be the signal of interest, or an unwanted interference that should 
be removed. Examples of the latter case are a spread spectrum signal that has been 
corrupted by a narrow-band signal and biomedical measurement signals that have been 
corrupted by the 50/60 Hz power-line interference. 

The idea of using prediction to extract a narrow-band signal when mixed with a wide- 
band signal follows from the following fundamental result of signal analysis: successive 
samples of a narrow-band signal are highly correlated with one another, whereas there 
is almost no correlation between successive samples of a wide-band process. Because of 
this, if a process x(n) consisting of the sum of a narrow-band and wide-band processes is 
applied to a predictor, the predictor output, x(n), will be a good estimate of the narrow- 
band portion of x(n). In other words, the predictor will act as a narrow-band filter, 
which rejects most of the wide-band portion of x(n) and keeps (enhances) the narrow- 
band portion, thus the name line enhancer. Examples of line enhancers can be found in 
Chapters 6 and 10. In particular, in Chapter 10, we find that line enhancers can be best 
implemented using IIR filters. 

We also note that in the applications where the narrow-band portion of x(n) has to be 
rejected (such as the examples mentioned above), the difference between x(n) and X(n), 
that is, the estimation error, e(n), is taken as the system output. In this case, the transfer 
function between the input, x(n), and the output, e(n), will be that of a notch filter. 


Speech Coding 


Since the advent of digital signal processing, speech processing has always been one of 
the focused research areas. Among various processing techniques that have been applied 
to speech signals, linear prediction has been found to be the most promising technique 
leading to many useful algorithms. In fact, most of the theory of prediction was developed 
in the context of speech processing. 

There are two major speech coding techniques that involve linear prediction (Jayant 
and Noll, 1984). Both these techniques aim at reducing the number of bits used for every 
second of speech to achieve saving in storage and/or transmission bandwidth. The first 
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Figure 1.13 Speech-production model. 


technique, which is categorized under the class of source coders, strives to produce dig- 
itized voice data at low bit rates in the range of 2 to 10 kb/s. The synthesized speech, 
however, is not of a high quality. It sounds more synthetic, lacking naturalism. Hence, 
it becomes difficult to recognize the speaker. The second technique, which comes under 
the class of waveform coders, gives much better quality at the cost of a much higher bit 
rate (typically, 32 kb/s). 

The main reason for linear prediction being widely used in speech coding is that speech 
signals can be accurately modeled as shown in Figure 1.13. Here, the all-pole filter is 
the vocal-tract model. The excitation to this model, u(n), is either a white noise in the 
case of unvoiced sounds (fricatives such as /s/ and /f/), or an impulse train in the case 
of voiced sounds (vowels such as /i/). The period of the impulse train, known as pitch 
period, and the power of the white noise, known as excitation level, are parameters of the 
speech model which are to be identified in the coding process. 


Linear Predictive Coding (LPC) 


Speech signal is a highly nonstationary process. The vocal-tract shape undergoes variations 
to generate different sounds in uttering each word. Accordingly, in LPC, to code a speech 
signal, it is first partitioned into segments of 10-30 ms long. These segments are short 
enough for the vocal-tract shape to be nearly stationary, so that the parameters of the 
speech-production model of Figure 1.13 could be assumed fixed. Then, the following 
steps are used to obtain the parameters of each segment: 


1. Using the predictor structure shown in Figure 1.12, the predictor coefficients, a;’s, are 
obtained by minimizing the prediction error e(n) in the least-squares sense, for the 
given segment. 

2. The energy of the prediction error e(n) is measured. This specifies the level of exci- 
tation required for synthesizing this segment. 

3. The segment is classified as voiced or unvoiced. 

4. In the case of voiced speech, the pitch period of the segment is measured. 
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The following parameters are then stored or transmitted for every segment, as the coded 
speech: (i) predictor coefficients, (ii) energy of excitation signal, (iii) voiced/unvoiced 
classification, and (iv) pitch period in the case of voiced speech. These parameters 
can then (when necessary) be used in a model similar to Figure 1.13 to synthesize the 
speech signal. 


Waveform Coding 


The most direct way of waveform coding is the standard pulse-code modulation (PCM) 
technique, where the speech signal samples are directly digitized into a prescribed num- 
ber of bits to generate the information bits associated with the coded speech. Direct 
quantization of speech samples requires relatively large number of bits (usually, 8 bits 
per sample) in order to be able to reconstruct the original speech with an acceptable 
quality. 

A modification of the standard PCM, known as differential pulse-code modulation 
(DPCM), employs a linear predictor such as Figure 1.12 and uses the bits associated with 
the quantized samples of the prediction error, e(n), as the coded speech. The rationale 
here is that the prediction error, e(n), has much smaller variance than the input, x(n). 
Thus, for a given quantization level, e(n) may be quantized with less number of bits 
compared to x(n). Moreover, as the number of information bits per every second of the 
coded speech is directly proportional to the number of bits used per sample, bit rate of 
the DPCM will be less compared to the standard PCM. 

The prediction filter used in DPCM can be fixed or be made adaptive. A DPCM system 
with an adaptive predictor is called adaptive DPCM (ADPCM). In the case of speech 
signals, use of ADPCM results in superior performance compared to the case where a 
nonadaptive DPCM is used. In fact, the ADPCM has been standardized and widely used 
in practice (ITU Recommendation G.726). 

Figure 1.14 depicts a simplified diagram of the ADPCM system, as proposed in ITU 
Recommendation G.726.”. Here, the predictor is a six-zero, two-pole adaptive IIR filter. 
The coefficients of this filter are adjusted adaptively so that the quantized error (n) is 
minimized in mean-square sense. The predictor input x(1) is same as the original input 
x(n) except for the quantization error in é(n). To understand the joint operation of the 
encoder and decoder shown in Figure 1.14, note that the same signal, (n), is used as 
inputs to the predictor structures at the encoder and decoder. Hence, if the stability of 
the loop consisting of the predictor and adaptation algorithm could be guaranteed, then 
the steady-state value of the reconstructed speech at the decoder, that is, x’(n), will be 
equal to that at the encoder, that is, x(m), as nonequal initial conditions of the encoder 
and decoder loops will die away after their transient phase. 


1.6.4 Interference Cancellation 


Interference cancellation refers to situations where it is required to cancel an interfering 
signal/noise from the given signal which is a mixture of the desired signal and the inter- 
ference. The principle of interference cancellation is to obtain an estimate of interfering 


2 ITU stands for International Telecommunication Union 
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Figure 1.14 ADPCM encoder—decoder. 
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Figure 1.15 Interference cancellation. 


signal and subtract that from the corrupted signal. Feasibility of this idea relies on the 
availability of a reference source from which the interfering signal originates. 

Figure 1.15 depicts the concept of interference cancellation, in its simplest form. There 
are two inputs to the canceler: primary and reference. The primary input is the corrupted 
signal, that is, the desired signal plus interference. The reference input, on the other hand, 
originates from the interference source only.? The adaptive filter is adjusted so that a 
replica of the interference signal that is present in the primary signal appears at its output, 
y(n). Subtracting this from the primary input results in an output which is cleared from 
interference, thus the name interference cancellation. 


3 In some applications of interference cancellation, there might also be some leakage of the desired signal to the 
reference input. Here, we have ignored this situation for simplicity. 
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Figure 1.16 Simplified diagram of a telephone network. 


We note that the interference cancellation configuration of Figure 1.15 is different from 
the previous cases of adaptive filters, in the sense that the residual error (which was 
discarded in other cases) is the cleaned-up signal, here. The desired signal in the previous 
cases has been replaced here by a noisy (corrupted) version of the actual desired signal. 
Moreover, the use of the term reference to refer the adaptive filter input is clearly related 
to the role of this input in the canceler. 

In the rest of this section, we present some specific applications of interference 
canceling. 


Echo Cancellation in Telephone Lines 


Echoes in telephone lines mostly occur at points where hybrid circuits are used to convert 
four-wire networks to two-wire ones. Figure 1.16 presents a simplified diagram of a 
telephone connection network, highlighting the points where echoes occur. The two wires 
at the ends are subscriber loops connecting customers’ telephones to central offices. It 
may also include some portions of the local network. The four wires, on the other hand, 
are carrier systems (trunk lines) for medium-to-long-haul transmission. The distinction is 
that the two-wire segments carry signals in both directions on the same lines, while in 
the four-wire segment signals in the two directions are transmitted on two separate lines. 
Accordingly, the role of hybrid circuit is to separate the signals in the two directions. 
Perfect operation of the hybrid circuit requires that the incoming signal from the trunk 
lines should be directed to the subscriber line and that there be no leakage (echo) of 
that to the return line. In practice, however, such ideal behavior cannot be expected 
from hybrid circuits. There would always be some echo on the return path. In the case of 
voice communications (i.e., ordinary conversation on telephone lines), effect of the echoes 
becomes more obvious (and annoying to the speaker) in long-distance calls, where the 
delay with which the echo returns to the speaker may be in the range of a few hundred 
milliseconds. In digital data transmission, both short- and long-delay echoes are serious. 

As was noted before and also can clearly be seen from Figure 1.17, the problem of 
echo cancellation may be viewed as one of system modeling. An adaptive filter is put 
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Figure 1.17 Adaptive echo canceler. 


between the incoming and outgoing lines of the hybrid. By adapting the filter to realize 
an approximation of the echo path, a replica of the echo is obtained at its output. This is 
then subtracted from the outgoing signal to clear that from the undesirable echo. 

Echo cancelers are usually implemented in transversal form. The time spread of echoes 
in a typical hybrid circuit is in the range of 20—30 ms. If we assume a sampling rate of 
8 kHz for the operation of the echo canceler, an echo spread of 30 ms requires an adaptive 
filter with at least 240 taps (30 msx8 kHz). This is a relatively long filter, requiring a 
high-speed digital signal processor for its realization. Frequency domain processing is 
often used to reduce the high computational complexity of long filters. The subject of 
frequency domain adaptive filters is covered in Chapter 8. 

The echo cancelers described previously are applicable to both voice and data trans- 
mission. However, more stringent conditions need to be satisfied in the case of data 
transmission. To maximize the usage of the available bandwidth, full-duplex data trans- 
mission is often used. This requires the use of a hybrid circuit for connecting the data 
modem to the two-wire subscriber loop, as shown in Figure 1.18. The leakage of the 
transmitted data back to the receiver input is thus inevitable and an echo canceler has to 
be added, as indicated in Figure 1.18. However, we note that the data echo cancelers are 
different from the voice echo cancelers used in central switching offices in many ways. 
For instance, because the input to the data echo canceler are data symbols, it can operate 
at the data symbol rate that is in the range of 2.4—3 kHz (about three times smaller than 
the 8kHz sampling frequency used in voice echo cancelers). For a given echo spread, a 
lower sampling frequency implies less number of taps for the echo canceler. Clearly, this 
simplifies the implementation of the echo canceler, greatly. On the other hand, the data 
echo cancelers require to achieve a much higher level of echo cancellation to ensure reli- 
able transmission of data at higher bit rates. In addition, the echoes returned from the other 
side of the trunk lines should also be taken care of. Detailed discussions on these issues 
can be found in Lee and Messerschmitt (1994) and Gitlin, Hayes, and Weinstein (1992). 


Acoustic Echo Cancellation 


The problem of acoustic echo cancellation can be best explained by referring to 
Figure 1.19, which depicts the scenario that arises in teleconferencing applications. 
The speech signal from a far-end speaker, received through a communication channel, 
is broadcast by a loudspeaker in a room and its echo is picked up by a microphone. 
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Figure 1.19 Acoustic echo cancellation. 


This echo must be canceled to prevent its feedback to the far-end speaker. The 
microphone also picks up the near-end speaker(s) speech and possible background noise, 
which may exist in the room. An adaptive transversal filter with sufficient length is used 
to model the acoustics of the room. A replica of the loudspeaker echo is then obtained 
and subtracted from the microphone signal before the transmission. 

Clearly, the problem of acoustic echo cancellation can also be posed as one of system 
modeling. The main challenge here is that the echo paths spread over a relatively long 
length in time. For typical office rooms, echoes in the range of 100—250 ms spread is 
quite common. For a sampling rate of 8 kHz, this would mean 800-2000 taps! Thus, the 
main problem of acoustic echo cancellation is that of realizing very long adaptive filters. 
In addition, as speech is a lowpass signal, it becomes necessary to use special algorithms 
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Figure 1.20 Active noise cancellation in a narrow duct. 


to ensure fast adaptation of the echo canceler. The algorithms discussed in Chapters 8 
and 9 have been widely used to overcome these difficulties in the implementation 
of acoustic echo cancelers. The topic of echo cancelers, with particular emphasis on 
acoustic echo cancelers, is covered in Chapter 15. 


Active Noise Control 


ANC refers to situations where acoustic antinoise waves are generated from electronic cir- 
cuits (Kuo and Morgan, 1996). The ANC can be best explained by the following example. 

A well-examined application of ANC is cancellation of noise in narrow ducts, such as 
exhaust pipes and ventilation systems, as illustrated in Figure 1.20. The acoustic noise 
traveling along the duct is picked up by a microphone at position A. This is used as 
reference input to an ANC filter whose parameters are adapted so that its output after 
conversion to an acoustic wave (through the canceling loudspeaker), is equal to the neg- 
ative value of the duct noise at position B, thereby canceling that. The residual noise, 
picked up by the error microphone at position C, is the error signal used for adaptation 
of the ANC filter. 

Comparing this ANC setup with the interference cancellation setup shown in 
Figure 1.15, we may note the following. The source of interference here is the duct 
noise, reference input is the noise picked up by the reference microphone, desired output 
(i.e., what we wish to see after canceling the duct noise) is zero, and primary input is 
the duct noise reaching position B. Accordingly, the role of ANC filter is to model the 
response of the duct from position A to B. 

The above description of ANC assumes that the duct is narrow and the acoustic noise 
waves are traveling along the duct, which is like a one-dimensional model. The acous- 
tical models of wider ducts and large enclosures, such as cars and aircrafts, are usually 
more complicated. Multiple microphones/loudspeakers are needed for successful imple- 
mentation of ANCs in such enclosures. The adaptive filtering problem is then that of a 
multiple-input multiple-output system (Kuo and Morgan, 1996). Nevertheless, the basic 
principle remains the same, that is, generation of antinoise to cancel the actual noise. The 
subject of active noise control is covered in detail in Chapter 16. 
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Figure 1.21 Spatial filtering (beamforming). 


Beamforming 


In the applications that have been discussed so far, the filters/predictors are used to 
combine together samples of the input signal(s) at different time instants to generate 
the output. Hence, these are classified as temporal filtering. Beamforming, however, is 
different from these in the sense that the inputs to a beamformer are samples of incoming 
signals at different positions in space. This is called spatial filtering. Beamforming finds 
applications in communications, radar, and sonar (Johnson and Dudgeon, 1993), and also 
imaging in radar and medical engineering (Soumekh, 1994). 

In spatial filtering, a number of independent sensors are placed at different points 
in space to pick up signals coming from various sources (Figure 1.21). In radar and 
communications, the signals are usually electromagnetic waves and the sensors are thus 
antenna elements. Accordingly, the term antenna arrays is often used to refer to these 
applications of beamformers. In sonar applications, the sensors are hydrophones designed 
to respond to acoustic waves. 

In a beamformer, the samples of the signals picked up by the sensors at a particular 
instant of time constitutes a snapshot. The samples of snapshot (spatial samples) play 
the same role as the successive (temporal) samples of input in a transversal filter. The 
beamformer filter linearly combines the sensor signals so that signals arriving from some 
particular direction are amplified, while signals from other directions are attenuated. Thus, 
in analogy with the frequency response of temporal filters, spatial filters have responses 
that vary according to the direction-of-arrival of the incoming signal(s). This is given in 
the form of a polar plot (gain versus angle) and is referred to as beam pattern. 

In many applications of beamformers, the signals picked up by sensors are narrow 
bands having the same carrier (center) frequency. These signals differ in their direction- 
of-arrival, which are related to the location of their sources. The operation of beamformers 
in such applications can be best explained by the following example. 

Consider an antenna array consisting of two omnidirectional elements A and B, as 
presented in Figure 1.22. The tone (as approximation to narrow-band) signals s(n) = 
acos@,n and v(n) = Bcos@,n arriving at angles O and 0, (with respect to the line 
perpendicular to the line connecting A and B), respectively, are the inputs to the array 
(beamformer) filter, which consists of a phase-shifter and a subtracter. The signal s(n) 
arrives at elements A and B at the same time, whereas the arrival times of signal v(n) at 
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Figure 1.22 A two-element beamformer. 


A and B are different. We may thus write 


sa (n) = Sp(n) = acos@,n 


Vpa(n) = $ cos wn 


and 
va (n) = Bcos(@,n — p) 


where subscripts A and B are used to denote the signals picked up by elements A and B, 
respectively, g is the phase shift arising from the time delay of arrival of v(n) at element 
A with respect to its arrival at element B. 

Now, if we assume that s(n) is the desired signal and v(n) is an interference, by 
inspection, one can see that if the phase-shifter phase is chosen equal to ø, then the 
interference, v(m), will be completely canceled by the beamformer. The desired signal, 
on the other hand, reaches the beamformer output as a(cos wpn — cos(@,n — ~)), which 
is nonzero (and still holding the information contained in its envelope, œ) when g Æ 0, 
that is, when the interference direction is different from the direction of the desired signal. 
This shows that one can tune a beamformer so as to allow the desired signal arriving 
from a direction to pass through it, while rejecting the unwanted signals (interferences) 
arriving from other directions. 

The idea of using a phase-shifter to adjust the beam pattern of two sensors is easily 
extendible to the general case of more than two sensors. In general, by introducing appro- 
priate phase shifts and also gains at the output of the various sensors and summing up 
these outputs, one can realize any arbitrary beam pattern. This is similar to the selection 
of tap weights of a transversal filter so that the filter frequency response becomes a good 
approximation to the desired response. Clearly, by increasing the number of elements in 
the array, better approximations to the desired beam pattern can be achieved. 

The last point that we wish to add here is that in cases where the input signals to the 
beamformer are not narrow band, a combination of spatial and temporal filtering needs to 
be used. In such cases, spatial information is obtained by having sensors at different posi- 
tions in space, as was discussed previously. The temporal information is obtained using 
a transversal filter at the output of each sensor. The output of the broadband beamformer 
is the summation of the outputs of these transversal filters. Detailed discussions on these 
points and the relevant mathematical backgrounds are presented in Chapter 18. 


2 


Discrete-Time Signals 
and Systems 


Most of the adaptive algorithms have been developed for discrete-time (sampled) sig- 
nals. Hence, discrete-time systems are used for implementation of adaptive filters. In this 
chapter, we present a short review of discrete-time signals and systems. Our assumption 
is that the reader is familiar with the basic concepts of discrete-time systems, such as the 
Nyquist sampling theorem, z-transform and system function, and also with the theory of 
random variables and stochastic processes. Our goal, in this chapter, is to review these 
concepts and put them in a framework appropriate for the rest of the book. 


2.1 Sequences and z-Transform 


In discrete-time systems, we are concerned with processing signals that are represented 
by sequences. Such sequences may be samples of a continuous-time analog signal or 
may be discrete in nature. As an example, in the channel equalizer structure presented 
in Figure 1.9, the input sequence to the equalizer, x(n), consists of the samples of the 
channel output which is an analog signal, but the original data sequence, s(n), is discrete 
in nature. 

A discrete-time sequence, x(n), may be equivalently represented by its z-transform 
defined as 


[0.6] 


Xe) = J anz” (2.1) 


n=—00 


where z is a complex variable. The range of values of z for which the above summa- 
tion converges is called the region of convergence of X (z). The following two examples 
illustrate this. 


Example 2.1 


Consider the sequence 


oe (2.2) 


a= (0) n<0 
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The z-transform of x,(n) is 


| 

WE 
S 
gi 


Xiz) = 
n=0 
Co 
= 5 (az ty" 
n=0 
which converges to 
1 
1 — az! 
for |az7!| < 1, that is, |z| > Ja]. We may also write 
Z 
Xi) = — (2.4) 
z—a 
for |z| > ļal. 
Example 2.2 
Consider the sequence 
0, n>0 
x(n) = ae. (2.5) 
The z-transform of x(n) is 
=i 
X,(z) — om brz 
n=—CO 
Co 
— >. (b-!z)" 
n=1 
which converges to 
blz z 
X(x) = = (2.6) 


for |z| < |bl. 


The two sequences presented in the above examples are different in many respects. The 
sequence x(n) in Example 2.1 is called right-sided, since its non-zero elements start at a 
finite n = n, (here, nı = 0) and extend up to n = +00. On the other hand, the sequence 
x(n) in Example 2.2 is a left-sided one. Its nonzero elements start at a finite n = n, 
(here, n) = —1) and extend up to n = —oo. This definition of right-sided and left-sided 
sequences also implies that the region of convergence of a right-sided sequence is always 
the exterior of a circle (|z| > |a|, in Example 2.1), while that of a left-sided sequence is 
always the interior of a circle (|z| < |b|, in Example 2.2). 

We thus note that the specification of the z-transform, X(z), of a sequence is com- 
plete only when its region of convergence is also specified. In other words, the inverse 
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z-transform of X(z) can be uniquely found, only if its region of convergence is also 
specified. For example, one may note that X,(z) and X,(z) in the above examples have 
exactly the same form, except a sign reversal. Hence, if their regions of convergence are 
not specified, both may be interpreted as the z-transforms of either left-sided or right-sided 
sequences. 

Two-sided sequences may also exist. A two-sided sequence is one that extends from n = 
—oo to n = +00. The following example shows how to deal with two-sided sequences. 


Example 2.3 


Consider the sequence 
n 


n20 (2.7) 


a, 
= b, n<0O 


where |a| < |b|. The condition |a| < |b|, as we shall see, is necessary to make the con- 
vergence of the z-transform of x3(n) possible. 
The z-transform of x3(7) is 
—1 lo) 
LOs eae (2.8) 
n=—0o n=0 


Clearly, the first sum converges when |z| < |b|, and the second sum converges when 
|z| > |a|. Thus, we obtain 
Zz Zz z(a — b 
A A E m 
b-z z—a (z-—a)(z—b) 


(2.9) 
for |a| < |z| < |b]. 


We may note that the region of convergence of X3(z) is the area in between two 
concentric circles. This is true, in general, for all two-sided sequences. For a sequence with 
a rational z-transform, the radii of the two circles are determined by two of the poles of the 
z-transform of sequence. The right-sided part of the sequence is determined by the poles 
which are surrounded by the region of convergence, and the poles surrounding the region 
of convergence determine the left-sided part of the sequence. The following example 
which also shows one way of calculating inverse z-transform clarifies the above points. 


Example 2.4 

Consider a two-sided sequence, x(n), with the z-transform 
—0.1z7! + 3.05277 

(1 — 0.5271). + 0.7z7!)(1 + 2z7!) 


and the region of convergence 0.7 < |z| < 2. 
To find x(n), that is, the inverse z-transform of X(z), we use the method of partial 
fraction and expand X (z) as 


X(z) = (2.10) 


+ 2 + . 
1—0.5z7! 1 + 0.7z7! 1 +2z7!’ 


X(z) = 
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where A, B, and C are constants that can be determined as follows: 
A = (1—-0.5z7!)X(2)|-05 = 1 
B = (1+0.7z7')X(z)|,=-07 = —2 
C = (14+227')X()|---9 = | 


This gives 
2 1 
=] -1 ag =1 
1—0.5z 1 + 0.7z 1+2z 
We treat each of the terms in the above equation separately. To expand these terms and, 
from there, extract their corresponding sequences, we use the following identity, which 
holds for |a| < 1, 


X= (2.11) 


1 2 
—— =] +a+a +e 
l-a 


We note that within the region of convergence of X(z), |0.5z~!| and |0.7z~!| are both 
less than 1, and thus 


n L40527 140.5 p (2.12) 
1 — 0.5z7! 
and 
2 2 
1+0.7z77! 1- (—0.7)z7! 
s=} OTe! H0 zr ej (2.13) 


However, for the third term on the right-hand side of Eq. (2.11), |2z7!| > 1, and, thus, an 
expansion similar to the last two is not applicable. A similar expansion will be possible, 


if we rearrange this term as 
1 0.5z 


1+2z7! 140.5z 
Here, within the region of convergence of X (z), |0.5z| < 1, and, thus, we may write 


0.5z 22 
= 0.5z(1 + (—0.5 —0.5 id 
1405: z+ ( )z + ( ae 
= —(—2)"1z — (—2) 22? — (273z — (2.14) 
Substituting Eqs. (2.12), (2.13), and (2.14) in Eq. (2.11) and recalling Eq. (2.1), 
we obtain 

= =? n 

ask aan (2.15) 
0.5” — 2(-0.7)", n>=0 


An alternative way of performing inverse z-transform can be derived using the Cauchy 
integral theorem, which is stated as follows: 
1 1, k=0 
— pc ldz = (2.16) 
2nj Ic 0, k#0 


where C is a counterclockwise contour that encircles the origin. 
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The z-transform relation, reproduced here for convenience, is given by 


[0.0] 


X(z) = 5 x(n)z™” (2.17) 


n=—00 


Multiplying both sides of Eq. (2.17) by z‘~! and integrating, we obtain 


D Xe ldz = aah 5 x(n)z "+t ldz (2.18) 


n=—C 


Inj 


where C is a contour within the region of convergence of X (z) and encircling the origin. 
Interchanging the order of integration and summation on the right-hand side of Eq. (2.18), 


we obtain 
CO 


1 
k-1 —n+k—1 
pee dz = > x(n) T “pe dz (2.19) 


n=—CO 


Inj 


Application of the Cauchy integral theorem in Eq. (2.19) gives the inverse z-transform 
relation 


x(n) = af xora (2.20) 


where C is a counterclockwise closed contour in the region of convergence of X (z) and 
encircling the origin of the z-plane. 

For rational z-transforms, contour integrals are often conveniently evaluated using the 
residue theorem, that is, 


l n—1 
x(n) = — QD X(z)z" dz 
2nj Jc 
= > [residues of X (z)z”7! at the poles inside C] (2.21) 


In general, if X(z)z"~! is a rational function of z, and Zp is a pole of X(z)z"~!, repeated 


m times i vie 
residue of X(z)z"~! at z,= | ro] 


P è (m—=1)!| dz”! TEn 


=zp 


where y(z) = (z — Zp)” X (z)z"—!. In particular, if there is a first-order pole at z = z 
that is, m = 1, then 


p’ 
residue of X(z)z"™! at zp = W(zp) (2.23) 


2.2 Parseval’s Relation 


Among various important results and properties of z-transform, in this book, we are in 
particular interested in Parseval’s relation which states that for any pair of sequences 
x(n) and y(n), 


[0.6] 


XO rmy) = = f X @)Y*(1/z*)z7'dz (2.24) 


n=— 00 
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where the superscript * denotes complex conjugation and the contour of integration is 
taken in the overlap of the regions of convergence of X(z) and Y *(1/z*). If X(z) and 
Y (z) converge on the unit circle, we can choose z = e/®, and Eq. (2.24) becomes 


[0.6] 


5 x(n)y*(n) = zl X(e/”)¥*(e/”) daw (2.25) 


n=—00 =W 


Furthermore, if y(n) = x(n), for all n, Eq. (2.25) becomes 


[0.0] 


2 kw? = = J IX e7”) Pde (2.26) 


n=—C 


Equation (2.26) has the following interpretation. The total energy in a sequence x(n), that 
is, 9 |x(n)|?, may be equivalently obtained by averaging |X (e/®)|* over one cycle 
of that. 


2.3 System Function 


Consider a discrete-time linear time-invariant system with the impulse response h(n). 
With x(n) and y(n) denoting, respectively, the input and output of the system, 


y(n) = x(n) x h(n) (2.27) 


where » denotes convolution and is defined as 


[0.0] 


x(n) h(n) = X h(x -— k) (2.28) 


k=—00 


Equation (2.27) suggests that any linear time-invariant system is completely characterized 
by its impulse response, h(n). Taking z-transform from both sides of Eq. (2.27), we obtain 


Y(z) = X(z)H(z) (2.29) 


This shows that the input-output relation for a linear time-invariant system corresponds 
to a multiplication of the z-transforms of the input and the impulse response of the system. 

The z-transform of the impulse response of a linear time-invariant system is referred to 
as its system function. The system function evaluated over the unit circle, |z| = 1, is the fre- 
quency response of the system, H (e/”). For any particular frequency w, H (e/”) is the gain 
(complex-valued, in general) of the system, when its input is the complex sinusoid e/®”. 

Any stable linear time-invariant system has a finite frequency response for all values of 
w. This means that the region of convergence of H(z) has to include the unit circle. This 
fact can be used to uniquely determine the region of convergence of any rational system 
function, once its poles are known. As an example, if we consider a sequence with the 


z-transform i 


(1 — 0.577!) (1 — 2z7!) 


A(z) = (2.30) 
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Figure 2.1 Possible regions of convergence for a two pole z-transform. 


we find that there are three possible regions of convergence for H(z) as specified in 
Figure 2.1. These are regions I, II, and II, each giving a different time sequence. However, 
if we assume that H(z) is the system function of a stable time-invariant system, the only 
acceptable region of convergence will be region II. Noting this, we obtain (Problem 2.1) 


4 yon 
h(in)=} 3 ™ a e o (2.31) 
—3 X 0.5", n>0 


We note that the impulse response h(n), obtained above, extends from n = —oo to 
n = +00. This means that, although the input, ô(n), is applied at time n = 0, the system 
output takes nonzero values even before that. Such a system is called noncausal. In 
contrast to this, a system is said to be causal if its impulse response is nonzero only for 
nonnegative values of n. Noncausal systems, although not realistic, may be encountered in 
some theoretical developments. It is important that we find a practical solution for handling 
such cases. The following example considers such a case and gives a solution to that. 


Example 2.5 


Figure 2.2 shows a communication system. It consists of a communication channel which 
is characterized by the system function 


C(z) = 1- 2.57! +27 = (1 — 0.5771) (1 — 227!) (2.32) 


The equalizer, H(z), should be selected so that the original transmitted signal, s(n), can 
be recovered from the equalizer output without any distortion. 

For this, we shall select H(z) so that we get y(n) = s(n). This can be achieved if H(z) 
is selected so that C(z)H(z) = 1. This gives 


1 


1 
Hejs = 
@ = 6 aor aae 


(2.33) 
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channel equalizer 


Figure 2.2 A communication system. 


Noting that this is similar to H(z) in Eq. (2.30), we find that the equalizer impulse 
response is the one given in Eq. (2.31). This, of course, is noncausal and, therefore, not 
realizable. The problem can be easily solved by shifting the noncausal response of the 
equalizer to the right by sufficient number of samples so that the remaining noncausal 
samples are sufficiently small and can be ignored. Mathematically, we say 
7A 
A(z) ¥ ——~ 2.34 
(z) CO (2.34) 
where A is the number of sample delays introduced to achieve a realizable causal system. 
We use the approximation sign, ~, in Eq. (2.34), since we ignore the noncausal samples 
of z~4/C(z). The equalizer output is then s(n — A). 


2.4 Stochastic Processes 


Input signal to an adaptive filter and its desired output are, in general, random, that 
is, they are not known a priori. However, they exhibit some statistical characteristics 
which have to be utilized for optimum adjustment of the filter coefficients. Such random 
signals are called stochastic processes. Adaptive algorithms are designed to extract these 
characteristics and use them for adjusting the filter coefficients. 

A discrete-time stochastic process is an indexed set of random variables {x(n);n = 

..,—-2,—-1,0,1,2,...}. As a random signal, the index n is associated with time or 

possibly some other physical dimension. In this book, for convenience, we frequently 
refer to n as time index. So far, we have used the notation x(n) to refer to a particular 
sequence x(n) that extends from n = —oo to n = +00. We use the notation {x(n)} for a 
stochastic process which a particular sequence x(n) may be a single realization of that. 

The elements of a stochastic process, {x(n)}, for different values of n, are in general 
complex-valued random variables that are characterized by their probability distribution 
functions. The inter-relationships between different elements of {x(n)} is determined by 
their joint distribution functions. Such distribution functions, in general, may change with 
the time index n. A stochastic process is called stationary in the strict sense, if all of its 
(single and joint) distribution functions are independent of a shift in the time origin. 


2.4.1 Stochastic Averages 


It is often useful to characterize stochastic processes by statistical averages of their ele- 
ments. These averages are called ensemble averages and, in general, are time dependent. 
For example, the mean of the nth element of a stochastic process {x(n)}, is defined as 


m,(n) = E[x(n)] (2.35) 
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where E[-] denotes statistical expectation. It shall be noted that since m,(n) is in general 

a function of n, it may not be possible to obtain m,(n) by time averaging of a single 

realization of the stochastic process {x(n)}, unless the process possesses certain special 

properties indicated at the end of this chapter. Instead, n has to be fixed and averaging 

has to be done over the nth element of the stochastic process, as a single random variable. 
In our later developments, we are heavily dependent on the following averages: 


1. Autocorrelation Function: For a stochastic process {x(n)}, it is defined as 
xx (n, m) = E[x(n)x*(m)] (2.36) 


where the superscript « denotes complex conjugation. 
2. Cross-Correlation Function: It is defined for two stochastic processes {x(n)} and 


{y(n)} as 
dry (n,m) = E[x(n)y*(m)] (2.37) 


A stochastic process {x(n)} is said to be stationary in the wide sense, if m,(n) and 
x(n, m) are independent of a shift of time origin. That is, for any k, m, and n, 


m,(n) =m,(n +k) 


and 
Py (Nn, m) = b(n +k,m+k) 


These imply that m,(n) is a constant for all n and $, (n,m) depends on the difference 
n — m only. Then, it would be more appropriate to define the autocorrelation function of 
{x(n)} as 

pa(k) = Elx(n)x*(n — k)] (2.38) 


Similarly, the processes {x(n)} and {y(n)} are said to be jointly stationary in the wide 
sense, if their means are independent of n and ¢,,(n,m) depends on n — m only. We 
may, then, define the cross-correlation function of {x(n)} and {y(n)} as 


Pry (k) = E[x(n)y*(n — k)] (2.39) 


Besides the autocorrelation and cross-correlation functions, autocovariance and cross- 
covariance functions are also defined. For stationary processes, these are defined as 


Valk) = E[(x(n) — m,)(x(n — k) — m,.)*] (2.40) 


and 
Yy © = Elan) — m,)(y(n — k) — my)*] (2.41) 


respectively. By expanding the right-hand sides of Eqs. (2.40) and (2.41), we obtain 
Ya (k) = yy (k) = |, |? (2.42) 


and 
Yyy (K) = yy (K) — mem} (2.43) 
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respectively. This shows that the correlation and covariance functions differ by some bias 
which are determined by the means of the corresponding processes. 

We may also note that for many random signals, the signal samples become less cor- 
related as they become more separated in time. Thus, we may write 


fim pal = my? (2.44) 
jim Vex (k) = 0 (2.45) 
Jim pyk) = m,m} (2.46) 
Jim, Vry(k) = 0 (2.47) 


Other important properties of the correlation and covariance functions that should be 
noted here are their symmetry properties which are summarized below: 


Pax (k) = Pex CK) (2.48) 
Vax (k) = yà (=k) (2.49) 
Pry (k) = By (—k) (2.50) 
Vey (k) = Vyr (~k) (2.51) 
We may also note that 
$a (0) = E[|x(n)|?] = mean-square of x(n) (2.52) 
Ya (0) = o? = variance of x(n) (2.53) 


2.4.2 z-Transform Representations 


The z-transform of ¢,,.(k) is given by 
oo 
Pu) = D> Pale (2.54) 


We note that a necessary condition for ®,,(z) to be convergent is that m, should be 
zero (Problem P2.4). We assume this for the random processes that are considered in the 
rest of this chapter, and also the following chapters. Exceptional cases will be mentioned 
explicitly. 

From Eq. (2.48), we note that 


©, (z) = D$ (1/2*) (2.55) 
Similarly, if ®,,,(z) denotes the z-transform of ¢,,(k), then 
Pa (z) = Ph d/z) (2.56) 


Equation (2.55) implies that if ®,,(z) is a rational function of z, its poles and 
zeros must occur in complex-conjugate reciprocal pairs, as depicted in Figure 2.3. 
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|z| = 1/|a| 


Figure 2.3 Poles and zeros of a typical z-transform of an autocorrelation function. 


Moreover, Eq. (2.55) implies that the points which belong to the region of convergence 
of (z) also occur in complex-conjugate reciprocal pairs. This in turn suggests that 
the region of convergence of ®,..(z) must be of the form (Figure 2.3) 


1 
la| < |z| < — (2.57) 
la| 
It is important that we note this covers the unit circle, |z| = 1. 
The inverse z-transform relation (Eq. 2.20) may be used to evaluate ¢,,.(0) as 


Py, (0) = = $ ®,,(z)z-1dz (2.58) 
2nj Jc 


We assume that ®,,(z) is convergent on the unit circle and select the unit circle as the 
contour of integration. For this, we substitute z by e/®. Then, œw changes from —z to +7 
as we traverse the unit circle once. Noting that z~'dz = jdw, Eq. (2.58) becomes 


T 


oy, (0) = = i P (e/?)dw (2.59) 
27 


=F 
Since m, = 0, we can combine Eqs. (2.52) and (2.53) with Eq. (2.59) to obtain 


T 


2 2 1 jw 
o? = E[|x(n)|7] = | d (e/”)da (2.60) 
27 J x 


2.4.3 The Power Spectral Density 


The function ®,,(z), when evaluated on the unit circle is the Fourier transform of the 
autocorrelation sequence ¢,,.(k). It is called power spectral density since it reflects the 
spectral content of the underlying process as a function of frequency. It is also called power 
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spectrum or simply spectrum. Convergence performance of an adaptive filter is directly 
related to the spectrum of its input process. Next, we present a direct development of the 
power spectral density of a wide-sense stationary discrete-time stochastic process which 
reveals its important properties. 

Consider a sequence x(n) which represents a single realization of a zero-mean wide- 
sense stationary stochastic process {x(n)}. We consider a window of 2N + 1 elements of 
x(n) as 

x(n), —-N<n<WN 


= 2.61 
tw) 0, otherwise ( ) 


By definition, the discrete-time Fourier transform of x, (n) is 


oo N 


Xe = Y sye = Y awe (2.62) 


n=—00 n=—N 
Conjugating both sides of Eq. (2.62), and replacing n by m, we obtain 


N 
Xt (e/”) = 5 x* (m) o" (2.63) 


m=—N 


Next, we multiply Eqs. (2.62) and (2.63) to obtain 


N N 
we r= +) >} aes Pr (2.64) 


n=—N m=—N 


Taking the expectation on both sides of Eq. (2.64), and interchanging the order of expec- 
tation and double summation, we get 


N N 
E(|Xy)P1= >) > Elx@)x*(m)le He ™ (2.65) 


n=—N m=—N 


Noting that E[x(n)x*(m)] = ¢,,(n — m), and letting k = n — m, we may rearrange the 
terms in Eq. (2.65) to obtain 


1 2 2N ikl , 
nee jox — = = isk 
angi Aae I= 2 (1 IN 4 7) bathe (2.66) 


To simplify Eq. (2.66), we assume that for k greater than an arbitrary large constant, but 
less than infinity, (k) is identically equal to zero. This, in general, is a fair assumption, 
unless the {x(m)} contains sinusoidal components, in which case the summation on the 
right-hand side of Eq. (2.66) will not be convergent. With this assumption, we get, 
from Eq. (2.66) 


1 = , 
TIEI = DT eee" (2.67) 


k=—00 


lim 
N->oo 2N 
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which is nothing but the Fourier transform of the autocorrelation function, @,,(k). We 
may, thus, write 


Pale) = dim 5 E[|Xy(e!”)|7] (2.68) 


N+1 


The function ®,,.(e/) is called the power spectral density of the stochastic wide-sense 
stationary process {x(n)}. It is defined as in Eq. (2.68) or more conveniently as the 
Fourier transform of the autocorrelation function of {x(n)}, 


oe )= > dhe ™ (2.69) 


k=- 


The power spectral density possesses certain special properties. These are indicated 
below for our later reference. 


Property 1: When the limit in Eq. (2.68) exists, ®,, (e7®) has the following interpretation: 


1 ; 
zz Px (e7®)dœw = average contribution of the 
T 


frequency components of {x(n)} 


located between @ and w + dw (2.70) 


This interpretation matches Eq. (2.60), if both sides of Eq. (2.70) are integrated over œw 
from —z to +7. We will elaborate more on this later, once we introduce response of 
linear systems to random signals; see Example 2.6. 


Property 2: The power spectral density ®,,(e/°) is always real and nonnegative. 
This property is obvious from the definition (Eq. 2.68), as |X y(e/”)|? is always real 
and nonnegative. 


Property 3: The power spectral density of a real-valued stationary stochastic process is 
even, that is, symmetric with respect to the origin w = 0. In other words, 


p (e72) = ©, (e772) (2.71) 


However, this may not be true when the process is complex-valued. 
This follows from Eq. (2.69), by replacing k with —k and noting that for a real-valued 
stationary process @,,(k) = $x (—k). 


Figure 2.4 A linear time-invariant system. 
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2.4.4 Response of Linear Systems to Stochastic Processes 


We consider a linear time-invariant discrete-time system, with input {x(7)}, output {y(7)} 
and impulse response A(n), as depicted in Figure 2.4. The input and output sequences are 
stochastic processes, but the system impulse response, A(n), is a deterministic sequence. 
Since {x(n)} and {y(n)} are stochastic processes, we are interested in finding how they 
are statistically related together. We assume that ¢,,.(k) is known, and find the relation- 
ships which relate this with ¢,,(k) and @,,,(k). These relationships can be conveniently 
established through the z-transforms of the sequences. 
We note that 


3 


= Do Efixn)y*(n — k)lz™ 


k=—oo 
=J E ko Yo A Ox*(n — k= J zk (2.72) 
k=—oo l=—oco 


Since both summation and expectation are linear operators, their orders can be inter- 
changed. Using this in Eq. (2.72), we get 


®,)= D> } MOERMx*a-k-Die* 


k=- 1=—00 


= DO DY) oak +~ (2.73) 


l=—0o k=—00 


If we substitute k +1 by m, we get 


DOF KO >, Gene 


l=—0o m=—0o 
= HO! YS ban) (2.74) 
l=—0o m=—OOo 
This gives 
®,, (Zz) = H*(1/2*)® (z) (2:75) 


where H(z) follows the conventional definition 


[0.0] 


H(z) = 5 h(n)z" (2.76) 


n=—00 


Furthermore, using Eqs. (2.55), (2.56), and (2.75), we can also get 


Paz) = H(z) ®,,(z) (2.77) 
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The autocorrelation of the output process {y(n)} is obtained as follows: 


Oy (= D> hyk) 


k=—00 
= 5. E[yn)y* (n — Ke 
k=—00 
= 3 E l 5. hD x(n —1) 3 h*(m)x*(n -e-m z 
k=—00 |=—00 m=—0o 
= 2 hil) 2 h*(m) X E[x(n — I)x* (n — k — m)]z™* 
I=—%0 m=— 00 k=—00 
= 3 hD) a h*(m) 3 yx (k +m —1)z* (2.78) 
l=—00 m=—oo k=—00 


Substituting k + m — l by p, we get 


(2) = Do AD! SS k YO bu (p)z? (2.79) 
l=—oco m=—O0oO p==00 
or 
Dy (2) = H(Z)H*(1/2")® (2) (2.80) 


It would be also convenient if we assume that z varies only over the unit circle, that 
is, |z| = 1. In that case 1/z* = z and Eqs. (2.75) and (2.80) simplify to 


®,,(z) = H*(z)®,, () (2.81) 


and 
(2) = H@)A*@)® (2) = HOP Pa) (2.82) 


respectively. Also, by replacing z with e/°, we obtain 


Oe = H* (ea le!) (2.83) 
oe = Heo) (2.84) 
®,, (e/”) = |HE’”)|*®,, (e/”) (2.85) 


These equations show how the cross power spectral densities, ® ,,, (e/”) and Py (e/”), and 
also the output power spectral density, ®,,, (e/”), are related with the input power spectral 
density, ®,,.(e/”), and the system transfer function, H(e/®). As a useful application of 
the above results, we consider the following example. 


Example 2.6 


Consider a band-pass filter with a magnitude response as in Figure 2.5. The input pro- 
cess to the filter is a zero-mean wide-sense stationary stochastic process {x(n)}. We are 
interested in finding the variance of the filter output, {y(7)}. 
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Figure 2.5 Magnitude response of a band-pass filter. 


We note that 
l, ao, <o<a, 


|H(e/”)| = | (2.86) 


0, otherwise 


Substituting this in Eq. (2.85) and using an equation similar to Eq. (2.60) for {y(n)}, we 
obtain 


2 1 7 joy |2 jo 1 a jo 
oy == |H(e!®) |" P(e” )do = — ®,..(e/")dw (2.87) 
20 Jax 20 Jo 


If œw approaches w,, then we may write w — w; = dw, where dw is a variable approach- 
ing zero. In that case, we may write 


1 ; 
o? = ae (e/?!)dw. 


This proves the interpretation of the power spectral density given by Property 1 in 
Section 2.4.3, that is, Eq. (2.70). 


Consider the case where there is a third process, {d(n)}, whose cross-correlation with 
the input process, {x(n)}, of Figure 2.4 is known. We are interested in finding the cross- 
correlation of {d(n)} and {y(n)}. 

In terms of z-transforms, we have 


y= J daz" 
k=- 


[0.0] 


= DF Eld@)y*a— bi 


k=—0o 


=) 5 an Yo A Ox*(n — k = J z= 


k=—0o l=- 


= #D J balk +D (2.88) 


l=—00 k=—00 
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Substituting k +l by m, we obtain 


Pay) = J POF JO dame” (2.89) 
l=—00 m=—oCo 
or 
Pay (2) = A*(1/z") Pa, (2) (2.90) 


Also, using Eq. (2.56), we can get, from Eq. (2.90) 


Pal) = A(z) ®,4(Z) (2.91) 


2.4.5 Ergodicity and Time Averages 


Estimation of stochastic averages, as was suggested above, requires a large number of 
realizations (sample sequences) of the underlying stochastic processes, even under the 
condition that the processes are stationary. This is not feasible, in practice, where usually 
only a single realization of each stochastic process is available. In that case, we have no 
choice, but to use time averages to estimate the desired ensemble averages. Then, a fun- 
damental question that arises is the following. Under what condition(s) do time averages 
become equal to ensemble averages? As one may intuitively understand, it turns out that 
under rather mild conditions, the only requirement for the time and ensemble averages to 
be the same is that the corresponding stochastic process be stationary (Papoulis, 1991). 

A stationary stochastic process {x(n)} is said to be ergodic if its ensemble averages are 
equal to time averages. Ergodicity is usually defined for specific averages. For example, 
we may come across the terms such as mean-ergodic or correlation-ergodic. In adaptive 
filters theory, it is always assumed that all the underlying processes are ergodic in the 
strict sense. This means, all averages can be obtained by time averages. We make such 
assumption throughout this book, whenever necessary. 


Problems 


P2.1 Find the z-transform and its region of convergence of the following sequences 


(i) 
ia <0 
a 0.7", >0 
(ii) 
QF. n<0O 
x(n) = 40.7", O<n<5 
0.5" n>5 


P2.2 Find the inverse z-transform of Eq. (2.30) when 


(i) its region of convergence is region I of Figure 2.1. 
(ii) its region of convergence is region II of Figure 2.1. 
(iii) its region of convergence is region III of Figure 2.1. 


Discrete-Time Signals and Systems 45 


P2.3 


P2.4 


P2.5 


P2.6 


P2.7 


P2.8 


P2.9 


P2.10 


P2.11 


Use the basic definitions of the correlation and covariance functions to prove the 
symmetry properties (Eq. 2.48) to (Eq. 2.51). 


Consider a stationary stochastic process, 
x(n) = v(n) + sin(w,n + 0) 


where {v(n)} is a stationary white noise, œ, is a fixed angular frequency, and 0 
is a random phase which is uniformly distributed in the interval ~x < 0 < m, 
but constant for each realization of {x(n)}. Find the autocorrelation function of 
{x(n)} and show that ®,..(z) has no region of convergence in the z-plane. 


Consider a stationary stochastic stationary process, {x(n)}, with mean m, Æ 0. 


(i) Show that ®,..(z) contains a summation which is not convergent for any 
value of the complex variable z. 

(ii) Consider the case where |z| = 1, that is, z = e/®, for 0 < w < 2x. For this 
case, argue that X(e/”) at the vicinity of œ =0 is an impulse with the 
magnitude of m,. 

(iii) What is the power spectral density ®,,(e/”) at the vicinity of œ = 0 
Hint: To answer this part, you may evaluate ¢,,.(k) first. 


In Problem P2.5, the nonzero mean of x(n) may be interpreted as a presence of 
a tone at w = 0. Repeat Problem P2.5 when x(n) contains a tone at an arbitrary 
frequency w = a. 


Prove the symmetry equations (2.55) and (2.56). 


A stationary unit-variance white noise process, {v(7)}, is passed through a linear 
time-invariant system with the system function 


1 
H®) = 7 la| < 1. 
If the system output is referred to as {u(n)}, find the following: 


(i) pı (z) and Diy (z). 
(ii) The cross-correlation and autocorrelation functions ¢,,,(k) and @,,, (k). 
(iii) Variance of {u(n)}. 


Repeat P2.8 when 
1 
(1 — az!) — bz)’ 


Find the answers for the two cases when a = b anda Æ b. 


H(z) = 


la| and |b| < 1. 


Repeat P2.8 when H(z) is a finite-impulse response system with 


N-1 
H(z)= > h(n)z~". 


n=0 


Work out the details of derivation of Eq. (2.66) from Eq. (2.65). 
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P2.12 


P2.13 


P2.14 


P2.15 
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Show that for a linear time-invariant system with input {x(n)}, output {y(n)}, and 
system function H(z) 
®,,(z) = H(z)®,, (2). 


Also, if {d(n)} is a third process 
®,4(z) = H(z) ®,q(z). 


Write the following z-transform relations in terms of the time series h(n) and the 
correlation functions: 


(i) Py (z) = H(2)P x (z): 

(ii) ®,,(z) = H*(1/z")®,, (z). 
(ii) P,a) = H(z) ®,4(z). 
(iv) 4, (2) = H*(1/2") Oy (2). 


Consider the system shown in Figure P2.14. The input processes, {u(n)} and 
{u(n)} are zero-mean and uncorrelated with each other. Derive the relationships 
which relate ®,,,(z), ®,,(z), H(z), and G(z) with the following functions: 


(i) Py (z). 


(ii) ,,(z). 
(iii) ,,,(z). 


u(n) H(z) 
| y(n) 
u(n) Gz) 


Figure P2.14 
Consider the system shown in Figure P2.15. The input, {v(n)}, is a stationary 
zero-mean unit-variance white noise process. Show that 


‘ = 1 

O Pal) = aoao 
Gi) $p) = 4. 

(iii) ®,,(2) = RE. 


u(n) 


Figure P2.15 
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P2.16 Consider the system shown in Figure P2.16. The input, {v(n)}, is a stationary 


P2.17 


zero-mean unit-variance white noise process. Show that 


(i) pym) = Dhl + m)g*(). 
(ii) Py (z) = H (z)G*(1/z*). 


where h(n) and g(n) are the impulse responses of the subsystems H(z) and G (z), 
respectively. 


Figure P2.16 


Consider the random process {u(n)}, where u(n) = x(n) + y(n), and {x(m)} and 
{y(n)} are the random processes that were introduced in Problem P2.16. Derive 
an expression for ®,,,(z) in terms of ®,,(z), ®,,(z) and ®,,,(z), 


(i) through direct use of the equation u(n) = x(n) + y(n). 
(ii) by noting that {u(n)} is the output of a system with input {v(7)} and transfer 
function H(z) + G(z). 
(iii) Confirm that the results obtained in Parts (i) and (ii) are similar. 


3 


Wiener Filters 


In this chapter, we study a class of optimum linear filters known as Wiener filters. As we 
will see in later chapters, the concept of Wiener filters is essential as well as helpful to 
understand and appreciate adaptive filters. Furthermore, Wiener filtering is general and 
applicable to any application that involves linear estimation of a desired signal sequence 
from another related sequence. Applications such as prediction, smoothing, joint process 
estimation, and channel equalization (deconvolution) are all covered by Wiener filters. 

We study Wiener filters by looking at them from different angles. We first develop the 
theory of causal transversal Wiener filters for the case of discrete-time real-valued signals. 
This will then be extended to the case of complex-valued signals. Our discussion follows 
with a study of unconstrained Wiener filters. The term unconstrained signifies that the 
filter impulse response is allowed to be noncausal and infinite in duration. The study of 
unconstrained Wiener filters is very instructive as it reveals many important aspects of 
Wiener filters, which otherwise would be difficult to see. 

In the theory of Wiener filters, the underlying signals are assumed to be random pro- 
cesses, and the filter design is done using the statistics obtained by ensemble averaging. 
We follow this approach while doing the theoretical development and analysis of Wiener 
filters. However, from the implementation point of view and, in particular, while devel- 
oping adaptive algorithms in later chapters, we have to consider the use of time averages 
instead of ensemble averages. Adoption of this approach in the development of Wiener 
filters is also possible, once we assume all the underlying processes are ergodic; that is, 
their time and ensemble averages are the same (Section 2.4.5). 


3.1 Mean-Squared Error Criterion 


Figure 3.1 shows the block schematic of a linear discrete-time filter W(z) in the context 
of estimating a desired signal d(n) based on an excitation x(n). Here, we assume that 
both x(n) and d(n) are samples of infinite length random processes. The filter output is 
y(n), and e(n) is the estimation error. Clearly, the smaller the estimation error, the better 
the filter performance. As the error approaches zero, the output of the filter approaches 
the desired signal, d(n). Hence, the question that arises is the following: what is the most 
appropriate choice for the parameters of the filter, which would result in the smallest 
possible estimation error? To a certain extent, the statement of this question itself gives 
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Figure 3.1 Block diagram of a filtering problem. 


us some hints on the choice of the filter parameters. As we want the estimation error to 
be as small as possible, a straightforward approach to the design of the filter parameters 
appears to be “to choose an appropriate function of this estimation error as a cost function 
and select that set of the filter parameters which optimizes this cost function in some 
sense.” This is indeed the philosophy that underlies almost all filter design approaches. 
The various details of this design principle will become clear as we go along. Commonly 
used synonyms for cost function are performance function and performance surface. 
In choosing a performance function, the following points have to be considered: 


1. The performance function must be mathematically tractable. 
2. The performance function should preferably have a single minimum (or maximum) 
point, so that the optimum set of filter parameters could be selected unambiguously. 


The tractability of the performance function is essential as it permits the analysis of the 
filter and also greatly simplifies the development of adaptive algorithms for adjustment of 
the filter parameters. The number of minima (or maxima) points for a performance func- 
tion is closely related to the filter structure. The recursive (infinite-impulse response — IIR) 
filters, in general, result in performance functions that may have many minima (or maxima) 
points, whereas the non-recursive (finite-impulse response — FIR) filters are guaranteed 
to have a single global minimum (or maximum) point if a proper performance function 
is used. Because of this, application of the IIR filters in adaptive filtering has been very 
limited. In this book, also, with the exception of a few cases, our discussion is limited to 
the FIR-adaptive filters. 

In Wiener filters, the performance function is chosen to be 


& = Elje(n)|7] (3.1) 


where E[-] denotes the statistical expectation. In fact, the performance function €, which 
is also called mean-squared error criterion, turns out to be the simplest possible function, 
which satisfies the two requirements noted above. It can easily be handled mathematically, 
and in many cases of interest, it has a single global minimum. In particular, in the case of 
FIR filters, the performance function € is a hyperparaboloid (bowl shaped) with a single 
minimum point, which can easily be calculated using the second-order statistics of the 
underlying random processes. 
It is instructive to note that a possible generalization of the mean-squared error criterion 
(3.1) is 
&, = Elje(n)|"] (3.2) 
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where p takes the integer values 1, 2, 3,.... Clearly, the case of p = 2 leads to the Wiener 
filter performance function, which is defined above. Cases where p > 2, with p being 
even, may result in more than one minimum and/or maximum point. Furthermore, the 
case of odd p turns out to be difficult to handle mathematically because of the modulus 
sign on e(n). 


3.2 Wiener Filter — Transversal, Real- Valued Case 


Consider a transversal filter which is shown in Figure 3.2. The filter input, x(n), and its 
desired output, d(n), are assumed to be real-valued stationary processes. The filter tap 
weights, Wo, W1, -.., Wy_ 1, are also assumed to be real-valued. The filter tap-weight and 
input vectors are defined, respectively, as the column vectors 


w=[wy w = wy] (3.3) 


and 
x(n) = [x(n) x(n — 1) --- xn -N+4+1)]! (3.4) 


where superscript T stands for transpose. 
The filter output is 


N-1 
y(n) = 5 w;x(n — i) = w' x(n) (3.5) 
i=0 
which can also be written as 
y(n) = x" (n)w (3.6) 


Figure 3.2 A transversal filter. 
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because w'x(n) is a scalar and thus it is equal to its transpose, that is, w!x(n) = 
(w'x(n))? = x'(n)w. Thus, we may write 


e(n) = d(n) — y(n) 
= d(n) — w'x(n) 
= d(n) — x'(n)w (3.7) 
Using Eq. (3.7) in Eq. (3.1), we get 
€ = Efe’(n)] = E[(d(n) — w'x(n))(d() — x'(n)w)] (3.8) 


Expanding the right-hand side of Eq. (3.8) and noting that w can be shifted out of the 
expectation operator, E[-], because it is not a statistical variable, we obtain 


E = E[d°(n)] — w' E[x(n)d(n)] 
—E[d(n)x' (n)|w + wT E[x(n)x! (n)]w (3.9) 

Next, if we define the N-by-1 cross-correlation vector 
p = E[x(n)d(n)| = [po Pi. Pyl" (3.10) 


and the N-by-N autocorrelation matrix 


roo Yo. Ton ct FO,N=1 
rio ri] Fiz Ses pea 

R=E[x(n)x'(n)] =] o fu n ce N= (3.11) 
FN-—1,0 "n-1,1 N-1,2 `° TN-1,N-1 


and note that E[d(n)x'(n)] = p", and also wp = p'w, we obtain 


£ = E[d?(n)] — 2w'p+w'Rw (3.12) 


This is a quadratic function of the tap-weight vector w with a single global minimum.! 


We will give the full details of this function in Chapter 4. 

To obtain the set of tap weights, which minimizes the performance function £, we need 
to solve the system of equations that results from setting the partial derivatives of € with 
respect to every tap weight to zero. That is, 


oe 


Ow; 


=0, fori=0,1,...,N—1 (3.13) 


These equations may collectively be written as 
VE =0 (3.14) 
'Tt may be noted that for Eq. (3.12) to correspond to a convex quadratic surface, so that it has a unique minimum 


point, and not a saddle point, R has to be a positive definite matrix. This point, which is missed out here, will be 
examined in detail in Chapter 4. 
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where V is the gradient operator defined as the column vector 


T 
v- Oak —] (3.15) 


OW 9 Ow, OWy_] 


and 0 on the right-hand side of Eq. (3.14) denotes the column vector consisting of N zeros. 
To find the partial derivatives of € with respect to the filter tap weights, we first expand 
Eq. (3.12) as 
N-i N-i 
E = E[d(n)] -2 a piwi + YY) wWmrim (3.16) 


1=0 m=0 


Also, we note that the double summation on the right-hand side of Eq. (3.16) may be 
expanded as 


N-1N-1 N-1N-1 
X ) WWmlim = X ; WIW mim F Wi 2 Wiri; 
1=0 m=0 1=0 m=0 
IAi_ mHAi ie 
N-1 
X ' 2 
+ Wi Wmlim F Wil ii (3.17) 
m=0 
m#i 


Substituting Eq. (3.17) in Eq. (3.16), taking partial derivative of £ with respect to w; and 
replacing m by /, we obtain 


N-1 
ð 
a = —2p; + Yo wfr try). fori=0,1,...,N—1 (3.18) 
wi 1=0 
To simplify this, we note that 
ry = Elx(n —Dx(n — i)] = 6, — 1) (3.19) 


where Ø „(i — L) is the autocorrelation function of x(n) for lag i — l. Similarly, 


ry = xl —i) (3.20) 


Considering the symmetry property of the autocorrelation function, that is, (k) = 


,,(—k), we get 
Yi = Til (3.21) 


Substituting Eq. (3.21) in Eq. (3.18), we obtain 


aé N-1 
yg 529 rawi- 2pis for i =0,1,...,N—1 (3.22) 
l l=0 


which can be expressed using matrix notation as 


Vé = 2Rw — 2p (3.23) 
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Letting VE = 0 gives the following equation from which the optimum set of the Wiener 
filter tap weights can be obtained. 
Rw, =p (3.24) 


Note that we have added the subscript “o” to w to emphasize that it is the optimum 
tap-weight vector. Equation (3.24), which is known as the Wiener—Hopf equation, has the 
following solution: 

w, =R'p (3.25) 


assuming that R has an inverse. 
Replacing w by w, and Rw, by p in Eq. (3.12), we obtain 


Emin = Eld?(n)] — wip 
= E[d?(n)] — wiRw,. (3.26) 


This is the minimum mean-squared error that can be achieved by the transversal Wiener 
filter W(z) and is obtained when its tap weights are chosen according to the optimum 
solution given by Eq. (3.25). 
For our later reference, we may also note that by substituting Eq. (3.25) in Eq. (3.26), 
we obtain 
Emin = Eld’ (n)] — p'R™'p (3.27) 


Example 3.1 


Consider the modeling problem shown in Figure 3.3. The plant is a two-tap filter with 
an additive noise, v(m), added to its output. A two-tap Wiener filter with tap weights 
Wo and w; is used to model the plant parameters. The same input is applied to both the 
plant and Wiener filter. The input, x(n), is a stationary white process with variance of 
unity. The additive noise, v(m), is zero-mean and uncorrelated with x(n), and its variance 
is ø? =0.1. We want to compute the optimum values of wọ and w,, which minimize 
E[e?(n)]. 


243271 
plant d(n) 
+ 
x(n) W(z) at T e(n) 
model 


(Wiener filter) 


Figure 3.3 A modeling problem. 
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We need to compute R and p to obtain the optimum values of wọ and w,, which 
minimize E[e*(n)]. For this example, we get 


_f Eo Ekma- _ [1 0 
R= P — Dx(n)]  E[x?(n — 1)] | = lo | (3.28) 


This follows, as x(n) is white, thus, E[x(n)x(n — 1)] = E[x(n — 1)x(n)] = 0, and also 
it has a variance of unity. The latter implies that E k? m] = E[x2(n — 1)] = 1. 
Also, we note that d(n) = 2x(n) + 3x(n — 1) + v(n), and, thus, 


_ | Elx(n)d(n)] 
~ | E[x(n — 1)d(n)] 


-| E[x(n)(2x(n) + 3x(n — 1) + v(n))] | 


E[x(n — 1)(2x(n) + 3x(n — 1) + v(n))] (3.29) 


Expanding the terms under the expectation operators, and noting that E[x?(n)] = 
E[x?(n — 1)] = 1 and E[x(n)x(n — 1)] = E[x(n)v(n)] = E[x(n — 1)v(n)] = 0, we get 


j= H (3.30) 
Similarly, we obtain 
E{d?(n)] = E[(2x(n) + 3x — 1) + v@))?] 
= 4E[x*(n)] + 9E[x7(n — 1)] + o? = 13.1 (3.31) 


Substituting Eqs. (3.28), (3.30), and (3.31) in Eq. (3.12), we get 
E = 13.1 —4wy — 6w; + wi + w? (3.32) 


This is a paraboloid in the three-dimensional space with the axes wg, w; and é. Figure 3.4 
shows this paraboloid. We may note that the optimum tap weights of the Wiener filter are 
given by Eq. (3.25), which for the present example may be written as 


ea ~ [o I H = H (3.33) 


Also, from Eq. (3.26), 
Enin = 13.1-[2 3] H = 0.1 (3.34) 
Clearly, the values of wo, Wo,;, and Emin coincide with the minimum point shown in 
Figure 3.4. 
The features of interest on the performance surface in Figure 3.4 and the results obtained 
in Eqs. (3.33) and (3.34) and also Figure 3.4 may be understood better, if we note that 
the right-hand side of Eq. (3.32) may also be expressed as 


E =0.1+ (wy — 2)? + (w — 3)? (3.35) 


Clearly, the minimum value of € is achieved when the last two terms on the right-hand side 
of Eq. (3.35) are forced to zero. This coincides with the results in Eqs. (3.33) and (3.34). 
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Figure 3.4 The performance surface of the modeling problem of Figure 3.3. 


3.3 Principle of Orthogonality 


In this section, we present an alternative approach for the design of Wiener filters. This 
presentation is a complement to the derivations in the last section in the sense that the 
approach presented below can be considered as a simplified/shortened version of the 
approach in the last section. More importantly, it leads to more insight into the concept 
of Wiener filtering problem. 

We start with the cost function equation (3.1), which in the case of real-valued data 
may be written as 


E = Efe*(n)] (3.36) 


Taking partial derivatives of £ with respect to the filter tap weights, {w;; i = 0, 1,..., N — 
1}, and interchanging the derivative and expectation operators (since these are linear 
operators), we obtain 


dg 
Ow; 


=E [zec pein 
dw 


l 


|; fori —0,1,...,N—1 (3.37) 


where e(n) = d(n) — y(n). As d(n) is independent of the filter tap weights, we get 


de(n) dy(n) 


Ow; dw 


x(n — i) (3.38) 


i 
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where the last result is obtained by replacing for y(n) from Eq. (3.5). Using this result in 
Eq. (3.37), we obtain 


OE 


5 = —2E[e(n)x(n—i)], fori=0,1,...,N-—1 (3.39) 
Wi 


From our discussion in the last section, we know that when the Wiener filter tap weights 
are set to their optimal values, the partial derivatives of the cost function, &, with respect 
to the filter tap weights are all zero. Hence, if e,(n) is the estimation error when the filter 
tap weights are set equal to their optimal values, Eq. (3.39) becomes 


Ele,m)x(n —i)]=0, fori=0,1,...,.N—1 (3.40) 


This shows that at the optimal setting of the Wiener filter tap weights, the estimation error 
is uncorrelated with the filter tap inputs, that is, the input samples used for estimation. 
This is known as the principle of orthogonality. 

The principle of orthogonality is an elegant result of the Wiener filtering that is fre- 
quently used for simple derivations of results which otherwise would seem far more 
difficult to derive. We will use the principle of orthogonality throughout this book for 
many of our derivations. 

As a useful corollary to the principle of orthogonality, we note that the filter output is 
also uncorrelated with the estimation error when its tap weights are set to their optimal 
values. This may be shown as follows: 


N-1 
Ele(n)y.(n)] = E co >| wo iat — J 


i=0 
N-1 
= Ý. w; Ele,(n)x(n — i)] (3.41) 
i=0 


where y,(7) is the Wiener filter output when its tap weights are set to their optimal values. 
Then, using Eq. (3.40) in Eq. (3.41), we obtain 


Ele.(a)yo(n)] = 0 (3.42) 


We may also refer to the above result by saying that the optimized Wiener filter output 
and the estimation error are orthogonal. 

The words orthogonality and orthogonal are commonly used for referring to pairs of 
random variables that are uncorrelated with each other. This originates from the fact that 
the set of all random variables with finite second-order moments constitutes a linear space 
with an inner product. The inner product in this space is defined to be the correlation 
between its elements. In particular, if x and y are two elements of the linear space of 
random variables, the inner product of x and y is defined as E[xy], when x and y are real- 
valued, or E[x y*], in the more general case of complex-valued random variables. Then, 
in analogy with the Euclidean space in which the elements are vectors, the geometrical 
concepts such as orthogonality, projection, and subspaces may also be defined for the 
space of random variables. Interested readers may refer to Honig and Messerschmitt 
(1984) for an excellent, yet simple, discussion on this topic. 
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Next, we use the principle of orthogonality to give an alternative derivation of the 
Wiener—Hopf equation (3.24) and also the minimum mean-squared error of Eq. (3.26). 
We note that 


N-1 
e(n) =d(n) — È` wy x(n — 1) (3.43) 
1=0 


where w,,’s are the optimum values of the Wiener filter tap weights. Substituting Eq. 
(3.43) in Eq. (3.40) and rearranging the results, we get 


N-1 
5 E[x(n — i)x(n — l)]ļw,; = Eld (n)x(n —i)], for i=0,1,...,N—=1 (3.44) 
1=0 


We also note that E[x(n —i)x(n —1)] =r, and E[d(n)x(n —i)] = p;. Using these in 
Eq. (3.44), we obtain 


N-1 
Š rawo =P, fori =0,1,...,.N-1 (3.45) 
1=0 


which is nothing but Eq. (3.24) in expanded form. 
Also, we note that 


Emin = Eleg(n)] 
= E[e,(n)(d(n) — y,(n))] 
= Efe,(n)d(n)] — Ele,(n)y,(n)] 
= Efe,(n)d(n)] (3.46) 


where Eq. (3.42) has been used to obtain the last equality. Now, substituting Eq. (3.43) 
in Eq. (3.46), we obtain 


N-1 
§min = E (a - >) wx - D) a] 


i=0 
N-1 
= Eld?(n)| — J wo; Eld(n)x(n — i)] 
i=0 
N-1 


= E[d?(n)] — y woi Pi (3.47) 
i=0 


which is nothing but Eq. (3.26) in expanded form. 


3.4 Normalized Performance Function 


Equation (3.43) can be written as 


d(n) = e,(n) + yo(n) (3.48) 
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Squaring both sides of Eq. (3.48) and taking expectation, we get 
E[d?(n)] = Elesn)] + Elya (n)] + 2Eleg(n)yo(n)] (3.49) 


We may note that E [e2(n)] = min and the last term in Eq. (3.49) is zero because of 
Eq. (3.42). Thus, we obtain 


Emin = Eld?(n)] — Ely2(n)] (3.50) 


which suggests that the minimum mean-squared error at the Wiener filter output is the 
difference between the mean-squared of the desired output and the mean-squared of the 
best estimate of that at the filter output. 

It is appropriate if we define the ratio 


2.8 
~ Efd2(n)] 


as the normalized performance function. We may note that ¢ = 1 when y(n) is forced to 
zero; that is, when no estimation of d(n) has been made. It reaches its minimum value, 
Smins When the filter tap weights are chosen to achieve the minimum mean-squared error. 
This is given by 


¢ (3.51) 


Elys()] 
n = 1l-—t 3.52 
Smin = | Ed] oon 
Noting that ¢,,;, cannot be negative, we find that its value remains between 0 and 1. The 
value of fmin is an indication of the ability of the filter in estimating the desired output. A 
value of Emin Close to zero is an indication of good performance of the filter, and a value 
Of Smin Close to I indicates poor performance of the filter. 


3.5 Extension to Complex-Valued Case 


There are some practical applications in which the underlying random processes are 
complex-valued. For instance, in data transmission (Chapter 17), the most frequently 
used signaling techniques are phase shift keying (PSK) and quadrature-amplitude modu- 
lation (QAM) in which the baseband signal consists of two separate components which 
are the real and imaginary parts of a complex-valued signal. Moreover, in the case of fre- 
quency domain implementation of adaptive filters (Chapter 8) and subband-adaptive filters 
(Chapter 9), we will be dealing with complex-valued signals, even though the original 
signals may be real-valued. 

In this section, we extend the results of the last two sections to the case of complex- 
valued signals. We assume a transversal filter as shown in Figure 3.2. The input, x(n), 
the desired output, d(m), and the filter tap weights are all assumed to be complex-valued. 
Then, the estimation error, e(n), is also complex-valued and we may write 


£ = Elle(n)|"] = Ele(nye*(n)] (3.53) 


where the asterisk denotes complex conjugation. 
As in the real-valued case, the performance function, €, in the complex-valued case is 
also a quadratic function of the filter tap weights. Similarly, to find the optimum set of the 
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filter tap weights, we have to solve the system of equations, which results from setting the 
partial derivatives of € with respect to every tap weight to zero. However, noting that 
the filter tap weights are complex variables, the conventional definition of derivative with 
respect to an independent variable is not applicable to the present case. In fact, one may 
note that each tap weight, in the present case, consists of two independent variables that 
make the real and imaginary parts of that. Thus, the partial derivatives with respect to 
these two independent variables have to be performed separately and the results have 
to be set to zero to obtain the optimum tap weights of the Wiener filter. In particular, 
to obtain the optimum set of the filter tap weights, the following set of equations has to 
be solved, simultaneously. 


ð 
=0 and E = 0, fori=0,1,...,N—1 (3.54) 
OW; R OW; 1 


where w; g and w; z denote the real and imaginary parts of w,, respectively. To write 
Eq. (3.54) in a more compact form, we note that £, w; g, and w; z are all real. This 
implies that the partial derivatives in Eq. (3.54) are also all real and thus the pairs of 
equations in (3.54) may be combined together to obtain 


dE 9 


OW; R OW; g 


=0, fori—0,1,...,N—1 (3.55) 


where j = /—1. This, in turn, suggests the following definition of gradient of a function 
with respect to a complex variable w = wp + jwy. 


TT 


a __ + j—— 3.56 
aT T (3.56) 


We note that when & is a real function of wg and w;, the real and imaginary parts of 
VCE are, respectively, equal to 0&/dwp and ðE /ðw;, and in that case VEE = 0 implies 
that 0§ /Owp = 0&/dw, = 0. It is in this context that we can say Eqs. (3.54) and (3.55) are 
equivalent. This would not be true, in general, if £ was complex-valued (Problem P3.9). 

With the above background, we may now continue with the derivation of the principle 
of orthogonality and its subsequent results, for the case of complex-valued signals. From 
Eq. (3.53), we note that 


Vu, 6 = Ele(n)V¥, e*(n) + e*(n) Vi, e(n)] (3.57) 
Noting that 
N-1 
e(n) = d(n) — 5 w,x(n — k) (3.58) 
k=0 
we obtain 
VE e(n) = =x (n — i)VE w; (3.59) 
and 


VE e(n) = —x*(n — i) VE, uF (3.60) 
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Applying definition (3.56), we obtain 
_ Ow 


VS w; = L =1+j(G)=1-1=0 (3.61) 
IW; R OW; 7 
and 
Cx Ow; , Ow; oo 
Vip WU; = +j =1+j(—j)=1+1=2 (3.62) 
: ƏW; R Ow; 


Substituting Eqs. (3.61) and (3.62) in Eqs. (3.59) and (3.60), respectively, and the results 
in Eq. (3.57), we obtain 
VS £ = —2E[e(n)x*(n — i)] (3.63) 


When the Wiener filter tap weights are set to their optimal values, Vue = 0. This 
gives 
Efe,(n)x*(n—i)]=0, fori =0,1,...,N—-1 (3.64) 


where e(n) is the optimum estimation error. The set of equations (3.64) represent the 
principle of orthogonality for the case of complex-valued signals. 

To proceed with the derivation of the Wiener—Hopf equation, we define the input and 
tap-weight vectors of the filter as 


x(n) ê [x(n) x(n — 1) ... xn — N + 1] (3.65) 


and 
w> (wh wt... w4] (3.66) 


respectively, where the asterisk and T denote complex conjugation and transpose, respec- 
tively. Note that the elements of the column vector w are complex conjugates of the actual 
tap weights of the filter, while conjugation is not applied to the samples of input in x(n). 
Also, for our further reference, later, we may write 


x(n) = [x*(n) x*(n— 1) --- x*(n— N + 1)]Ë (3.67) 


and 
w = [wọ w -+ wy]" (3.68) 


where the superscript H denotes complex-conjugate transpose or Hermitian. 
The set of equations (3.64) may also be written as 


Efes(n)x(n —i)] =0, fori=0,1,...,N—1 (3.69) 
Using the definition (3.65), these may be packed together as 
E[es(n)x(n)] = 0 (3.70) 


Also, we note that 
e,(n) = d(n) — w'x(n) (3.71) 


where w, is the optimum tap-weight vector of the Wiener filter. 


Wiener Filters 61 


Replacing Eq. (3.71) in Eq. (3.70), we obtain 
E[x(n)(d*(n) — x#(n)w,)] = 0 (3.72) 


Rearranging Eq. (3.72), we get 
Rw, =p (3.73) 


where R = E[x(n)x#(n)] and p = E[x(n)d*(n)]. This is the Wiener—Hopf equation for 
the case of complex-valued signals. 
Also, following the same derivations as Eqs. (3.46) through (3.47), for the present case, 
we obtain 
Emin = Elld(n)|7] — wep 


= E[|d(n)|"] — we Rw, (3.74) 


3.6 Unconstrained Wiener Filters 


The developments in the last three sections put some constraints on the Wiener filter by 
assuming that it is causal and the duration of its impulse response is limited. In this section, 
we remove such constraints and let the Wiener filter impulse response, w,, to extend from 
i = —co to i = +00 and derive equations for the filter performance function and its opti- 
mal system function. Such developments are very instructive for understanding many of 
the important aspects of the Wiener filter, which otherwise could not be easily understood. 

Consider the Wiener filter shown in Figure 3.1, and repeated here in Figure 3.5, for 
convenience. We assume that the filter W(z) may be noncausal and/or IIR. In order to 
keep the derivations in this section as simple as possible and also to emphasize more 
on the concepts, we consider only the case in which the underlying signals and system 
parameters are real-valued. Moreover, we assume that the complex variable z remains on 
the unit circle, that is, |z| = 1. This implies that z* = z!. Also, for our later reference, 
we note that when the coefficients of a system function, such as W (z), are real-valued, 
W*(1/z*) = W(z7!), for all values of z, and W(z~!) = W*(z), when |z| = 1. 

The derivations that follow in this section highly depend on the results developed in 
Section 2.4.4 of Chapter 2. The reader is encouraged to review the latter section before 
continuing with the rest of this section. 


3.6.1 Performance Function 


Recall that the Wiener filter performance function is defined as 


& = Efe?(n)] 


Figure 3.5 Block diagram of a Wiener filter. 
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Substituting e(n) by d(n) — y(n) and expanding, we get 
E = Eld?(n)] + Ely?(n)] — 2Ely(n)d(n)] (3.75) 
In terms of autocorrelation and cross-correlation functions (Chapter 2), we may write 
E = pua 0) + by (0) — 20,4 0) (3.76) 


Replacing the last two terms on the right-hand side of Eq. (3.76) with their corresponding 
inverse z-transform relations, we obtain 


1 dz 1 dz 
E = Pad (0) + af ®,, (z)— a 2 x af Pa (z)— (3.77) 
2nj jc z 2rj Ic z 


Also, from our discussion in Chapter 2, Section 2.4.4, we recall that when x(n) and y(n) 
are related as shown in Figure 3.5, for an arbitrary sequence d (n), ®,)(z) = W (z2)® 4a (2). 
Also, if z is selected to be on the unit circle in the z-plane, ®,,, (z) = |W (z)|*®,..(2), 
|W (2)? = W(z)W*(z), and W*(z) = W (z7!). Using these in Eq. (3.77), we obtain 


1 d 1 d 
£ = $44(0) + aah WORE (2) — 2 x af wodu) = 
TJ Jc 7 2rjjJc z 
1 d 
= Qda (0) + of [W*(z)®,. (z) = 2® 1 QWE (3.78) 
TNJJC Z 


where the contour of integration, C, is the unit circle. This is the performance function 
for a Wiener filter with the system function W (z), in its most general form. It covers IIR 
and FIR, as well as causal and noncausal filters. The following examples show some of 
the flexibilities of Eq. (3.78). 


Example 3.2 


Consider the case where the Wiener filter is an N-tap FIR filter with the system function 


Wz) =} wz” (3.79) 


This is the case that we studied in Sections 3.1 and 3.2. 
Using Eq. (3.79) in the first line of Eq. (3.78), we obtain 


N-1 
d 
E = Paa (0) t3 A Me wz ) (£ wna”) OR 


m=0 


dz 
(x wz” ) Pa (3.80) 
<$ zZ 
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Interchanging the order of the integrations and summations, Eq. (3.80) is simplified to 


N-1N-1 


1 
E = Paa (0) + 5 > Un sa P Pala)" Ne 


1=0 m=0 


N-1 
1 
-25` — oo =la 3.81 
2 KrF $ xd (Z)Z z (3.81) 


Using the inverse z-transform relation, this gives 


N-1N-1 N-1 
E = 440) + Yo Yo Wd (mM =D) —2 D> wpa) (3.82) 
l=0 m=0 1=0 


Now, using the notations %44a (0) = E[d?(n)], pa (D) = pi, and pa (m —1) = pa (l — 
m) = Tj, we see that the performance function given by Eq. (3.82) is the same as what 
we had derived earlier in Eq. (3.16). 


Example 3.3 


Consider the modeling problem depicted in Figure 3.6, where a plant G(z) is being 
modeled by a single-pole single-zero Wiener filter 


1— woz! 


WQ) : 


= (3.83) 
1 — w,z7 
To keep our discussion simple, we assume that all the involved signals and system 
parameters are real-valued. The input sequence, x(n), is assumed to be a white process 
with zero-mean and variance of unity, and uncorrelated with the additive noise v(n). This 

implies that 
(2) =1 and %,,(z) =0 (3.84) 


We note that d(n) is the noise corrupted output of the plant, G(z), when it is excited with 
the input x(n). Then, using the relationship (2.75) given in Chapter 2, and noting that all 


model 
(Wiener filter) 


Figure 3.6 A modeling problem with an IIR model. 
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the signals here are real-valued, we get 
Paal) = Gz!) (2) (3.85) 
Using this in Eq. (3.78), we obtain 


1—woz! l—woz dz 
z: ; 


1 
£= bl) + ze 


tjJcol—w,z- 


1 1— wz! d 
TOS § M og j> (3.86) 
2nj Jol —wyz7! z 


l=wz z 


Using the residue theorem to calculate the above-mentioned integrals and assuming that 
G(z7!) has no pole inside the unit circle, we get 


— Wo 1— Wow] Wo 


2 


w 
E = paua O + + 
W1 _ wI w1 


-2 [=w de “2G (00)| (3.87) 
w] w] 


This is the performance function of the IIR filter shown in Figure 3.6. We note that, 
although we have selected a very simple example, the resulting performance function is 
a complicated one. It is clear that a performance function such as Eq. (3.87) or more 
complicated ones that would result for higher order filters is difficult to be handled. In 
particular, one may find that there can be many local minima and searching for the global 
minimum of the performance function may not be a trivial task. This, when compared 
with the nicely shaped quadratic performance function of FIR filters, makes it clear why 
most of the attention in adaptive filters have been devoted to the transversal structure. 


3.6.2 Optimum Transfer Function 


We now derive an equation for the optimum transfer function of unconstrained Wiener 
filters, that is, when the filter impulse response is allowed to extend from time n = —oo 
to n = +00. We use the principle of orthogonality for this purpose. As the filter impulse 
response stretches from time n = —oo to n = +00, the principle of orthogonality for 
real-valued signals suggests 


E[e,(n)x(n—i)]=0, fori =...,—2,—-1,0,1,2,... (3.88) 
where e(n) is the optimum estimation error and is given by 


e (n) = d(n) — 5 Wo x(n — l) (3.89) 


l=—00 


Here, w,’ s are the samples of the optimized Wiener filter impulse response. 
Substituting Eq. (3.89) in Eq. (3.88) and rearranging the result, we obtain 


5 Wy E[x(n — l)x(n — i)] = Eld (n)x(n — i)] (3.90) 


l=—oo 
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We may also note that E[x(n — 1)x(n — i)] = ọ„ (i — 1) and E[d(n)x(n — i)] = ġa (i). 
Using these in Eq. (3.90), we get 


oo 


Y wbx -D= ba), fori=...,-2,-1,0,1,2,... (3.91) 


l=—00 


Noting that Eq. (3.91) holds for all values of i, we may take z-transforms on both sides 
to obtain 


Pa) W) = Pkk) (3.92) 


This is referred to as the Wiener—Hopf equation for unconstrained Wiener filtering prob- 
lem. The optimum unconstrained Wiener filter is given by 


Parl) 
Woz) == 3.93 
ok) D (3.93) 
Replacing z by e/® in Eq. (3.93), we obtain 
Dj, (e/”) 
W,(e/?) = = 3.94 
Ae le cry (3.94) 


This result has an interesting interpretation. It shows that the frequency response of the 
optimal Wiener filter, for a particular frequency, say w = w,, is determined by the ratio of 
the crosspower spectral density of d(n) and x(n), to the power spectral density of x(n), 
at w = a. This, in turn, may be obtained through a sequence of filtering and averaging 
steps, as depicted in Figure 3.7. The sequences x(n) and d(n) are first filtered by two 
identical narrow-band filters, centered at w = w;. To keep the phase information of the 
underlying signals, these filters are designed to pick-up signals from the positive side 


tiln) 


x denotes conjugation 


Figure 3.7 Procedure for calculating the transfer function of a Wiener filter through a sequence 
of filtering and averaging. 
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of the frequency axis, only. The signal spectrums belonging to negative frequencies are 
completely rejected. As a result, the filtered signals, d;(n) and x;(n), are both complex- 
valued. The cross-correlation of d;(n) and x;(n) with zero lag, that is, E[d;(n)x;(n)], gives 
a quantity proportional to ®,,(e/“'), and the average energy of x;(n) gives a quantity 
proportional to ®,,(e/”') — see Papoulis (1991). The ratio of these two quantities gives 
W,(e/ “i), This interpretation becomes more interesting, if we note that W,(e/ “i) is also 
the optimum tap weight of a single-tap Wiener filter whose input and desired output are 
the complex-valued random processes x;(n) and d;(n), respectively; see Problem P3.21. 
The minimum mean-squared estimation error for the unconstrained Wiener filtering 
case can be obtained by substituting Eq. (3.93) in Eq. (3.78). For this, we first note that 
when |z| = 1, 
IW)? = Welz) Wo (2) 
p7. (z) 
O) 
= W(t (3.95) 
Pa (z) 


as, on the unit circle, ®%.(z) = ®,4 (z) and ®%, (z) = ®,,(z). Using this result in 
Eq. (3.78), we get 


= W,(z) 


1 dz 
Ein = Pad (0) — aah W, (2)® a (Z)— (3.96) 
HJ IC z 


This may be considered as a dual of the previous derivations in Eqs. (3.26), (3.47), and 
(3.74); see Problem P3.18. 
Replacing z by e/® in Eq. (3.96), we obtain 


1 * i , 
Enin = Pad (0) = zÍ W, (e) a (e/®)dw (3.97) 


3.6.3 Modeling 


In this and the subsequent two subsections, we discuss three specific applications of 
Wiener filters, namely, modeling, inverse modeling, and noise cancellation. These cover 
most of the cases that we encounter in adaptive filtering. Our aim, in these presentations, 
is to highlight some of the important features of Wiener filters when applied to various 
applications of adaptive signal processing. 

Consider the modeling problem depicted in Figure 3.8. An estimate of the model of a 
plant G(z) is to be obtained by the Wiener filter W (z). The plant input u(n), contaminated 
with an additive noise v;(n), is available as the Wiener filter input. The noise sequence 
v;(n) may be thought of as introduced by a transducer that is used to get samples of 
the plant input. There is also an additive noise v,(n) at the plant output. The sequences 
u(n), v;(n), and v,(n) are assumed to be stationary, zero-mean, and uncorrelated with one 
another. 

We note that, for the present problem, the Wiener filter input and its desired output are, 
respectively, 

x(n) = u(n) + v,(n) (3.98) 
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G(z) =~ 
plant d(n) 
—— + 
v(m) a(n) y(n) e(n) 
—— op WV eea 
model 


(Wiener filter) 


Figure 3.8 Block diagram of a modeling problem. 


and 
d(n) = g, xu(n) + v, (n) (3.99) 


where g,,’s are the samples of the plant impulse response, and » denotes convolution. 
We use Eq. (3.93) to obtain the optimum transfer function of the Wiener filter, W,(z). 
For this, we should first find ®,,(z) and ®,,(z). We note that 


$a (k) = Elx(n)x(n — k)] 
= E[(u(n) + vn) (u(n — k) + v,(n — k))] 
= Elu(n)u(n — k)] + Elu(n)v,(n — k)] 
+E[y,(n)u(n — k)] + E[v;(n)vi(n — k)] (3.100) 


As u(n) and v;(m) are uncorrelated with each other, the second and third terms on the 
right-hand side of Eq. (3.100) are zero. Thus, we obtain 


Pix (K) = Puu K) + by, (3.101) 
Taking z-transform on both sides of Eq. (3.101), we get 
®,,(Z) = Py, (Z) + Pink) (3.102) 


To find ®,,(z), we note that only u(n) is common to x(n) and d(n), and the signals 
u(n), v;(n), and v,(m) are uncorrelated with one another. Considering these and following 
a procedure similar to the one used to arrive at Eq. (3.102), one can show that 


Da, (Z) = Pank) (3.103) 


where d'(n) is the plant output when the additive noise v,(n) is excluded from that. 
Moreover, from our discussions in Chapter 2, we have 


Pau) = G(z) ®,, (2) (3.104) 
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Thus, 
Day (Z) = G(z)®,,, (2) (3.105) 


Using Eqs. (3.105) and (3.102) in Eq. (3.93), we obtain 


meo=— a! Gg) (3.106) 
° oan Pu (z) F @,, 42) 


We note that W,(z) is equal to G(z), only when ®,,,,(z) is equal to zero. That is, when 
v;(n) is zero for all values of n. 
It is also instructive to replace z by e/® in Eq. (3.106). This gives 


Diy (e/”) 


—Gier” 3.107 
METCO R 


W,(e/”) = 


This result has the following interpretation. Matching between the unconstrained Wiener 
filter and the plant frequency response at any particular frequency, œ, depends on the 
signal-to-noise power spectral density ratio Ọ (e72) TP ing (e/®). Perfect matching is 
achieved when this ratio is infinity (i.e., when ®,,.,, (e/®) = 0), and the mismatch between 
the plant and its model increases as ®,,(e!°)/®,,,.(e/°) decreases. Note that ®,,,(e/°) 
and ®,,,,.(e/°) are power spectral density functions, and, thus, are real and nonnegative. 
We may also define 
P(e") 


K(e/®) £ - 
Pu (e72) F Pivi (e72) 


(3.108) 


and note that K (e/”) is real and varies in the range of 0 to 1 as the power spectral density 
functions ®,,,(e/”) and Dis (e/”) are both real and nonnegative. Furthermore, to prevent 
ambiguity of the above ratio, we assume that for all values of œ, ®,,,(e/°) and Pini (e72) 
are never equal to zero simultaneously. Using this, we obtain 


uu 


W,(e/”) = K(e!”)G(e!”) (3.109) 
An expression for the minimum mean-squared error of the modeling problem is obtained 
by replacing Eqs. (3.105) and (3.109) in Eq. (3.97). This gives 
1 f* , 
Emin = Pag (0) — al K (e12) ® „ (e/)|G(e/®) Pda (3.110) 
=T 
We may also note that d'(n) and v,(n) are uncorrelated, and, thus, 


paa (0) = Prava (O) + baa) (3.111) 


Also, i a 
bra) = >> / ®,,,(e/°)|G(e!”)/?da (3.112) 
T Jr 


Substituting Eqs. (3.111) and (3.112) in Eq. (3.110), we obtain 


T 


Emin = vov, (0) + - / (1 — K(e/”))®,,, (e/°)|G(e!”) Pda (3.113) 


uu 
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We note that the minimum mean-squared of the estimation error consists of two distinct 
components. The first one comes directly from the additive noise, v,(7), at the plant output. 
The Wiener filter will not be able to reduce this component as v,(7) is uncorrelated with 
its input x(n). The second component arises because of the input noise, v;(n), which, in 
turn, results in some mismatch between G(z) and W,(z). Thus, the best performance that 
one can expect from the optimum unconstrained Wiener filter is Enin = %,,,,(0) and this 
happens when the input noise v;(n) is absent. 

Another very important and useful concept that can be understood based on the above 
theoretical exercise is the principle of correlation cancellation. We remarked previously 
that the Wiener filter cannot do anything to reduce the contribution $, ,, (0) from the total 
mean-squared error. This is because the input x(n) of the Wiener filter is uncorrelated with 
the output noise v,(n) and hence, the filter tries to match its output y(n) with the plant 
output d'(n) without bothering about v,(n). In other words, the Wiener filter attempts to 
estimate that part of the target signal d(n), which is correlated with its own input x(n) (1.e., 
d'(n)) and leave the remaining part of d(n) (i.e., v,(n)) unaffected. This is known as the 
principle of correlation cancellation. However, as noted above, perfect cancellation of the 
correlated part d’(n) from d(n) will only be possible when the input noise v;(n) is absent. 


min 


3.6.4 Inverse Modeling 


Inverse modeling has applications both in communications and control. However, most of 
the theory of inverse modeling has been developed in the context of channel equalization. 
We also emphasize on the latter. Figure 3.9 depicts a channel equalization scenario. The 
data samples, s(n), are transmitted through a communication channel with the system 
function H(z). The received signal at the channel output is contaminated with an additive 
noise v(n), which is assumed to be uncorrelated with the data samples, s(n). An equalizer, 
W(z), is used to process the received noisy signal samples, x(n), to recover back the 
original data samples, s(n). 

When the additive noise at the channel output is absent, the equalizer has the following 
trivial solution: 


W,(z) = (3.114) 


H(z) 
In the absence of channel noise, this results in perfect recovery of the original data 
samples, as W,(z)H(z) = 1. This implies that y(n) = s(n), and thus e(n) = 0, for all n. 
This, clearly, is the optimum solution, as it results in zero mean-squared error which of 


Figure 3.9 Channel equalization. 
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course is the minimum as mean-squared error is a nonnegative quantity. The following 
example gives a better view of the problem. 


Example 3.4 


Consider a channel with 
H(z) = —0.4 + z7! — 0.477? (3.115) 


Also, assume that the channel noise, v(n), is zero, for all n. The channel output then is 
obtained by convolving the input data sequence, s(n), with the channel impulse response, 
h,, which consists of three nonzero samples hy = —0.4, h; = 1, and h, = —0.4. This 
gives 

x(n) = —0.4s (n) + s(n — 1) — 0.4s (n — 2) (3.116) 


in the absence of channel noise. 

We note that each sample of x(n) is made of a mixture of three successive samples of the 
original data. This is called intersymbol interference (ISI), and it should be compensated 
or canceled for correct detection of transmitted data. For this purpose, we may use an 
equalizer with the system function (3.114) 


1 1 
W (z) = = 3117 
0 = FH ospr oa ak 
Factorizing the denominator of W,(z) and rearranging, we get 
—2.5 
W,(z) = (3.118) 


(1 — 0.5z—!)(1 — 2z7!) 


This is a system function with one pole inside and one pole outside the unit circle. With 
reference to our discussions in Chapter 2, we recall that Eq. (3.118) will correspond to 
a stable time-invariant system, if the region of convergence of W,(z) includes the unit 
circle. Considering this and finding the inverse z-transform of W,(z), we obtain (Chapter 2, 
Section 2.2) 
10 i . 
_ |e * 45 i<0O 
Woi = 25 “05. i>0 (3.119) 
To obtain this result, we have noted that W,(z) of Eq. (3.118) is similar to H(z) of 
Eq. (2.30), except for the factor —2.5 in Eq. (3.118). Figure 3.10a—c shows the samples 
of the impulse responses of the channel, equalizer, and their convolution, respectively. 
Existence of ISI at the channel output, as noted above, is because of more than one (here, 
three) nonzero samples in the channel impulse response. This is observed in Figure 3.10a. 
Figure 3.10c shows that the ISI is completely removed after passing the received signal 
through the equalizer. 


When the channel noise, v(m), is nonzero, the solution provided by Eq. (3.114) may 
not be optimal. The channel noise also passes through the equalizer and may be greatly 
enhanced in the frequency bands where H(e/”) is small. In this situation, a compromise 
has to be made between cancellation of ISI and noise enhancement. As we show in the 
following, the optimal Wiener filter achieves this trade-off in an effective way. 


Wiener Filters 71 


10 -8 6 4 -2 0 2 4 6 8 10 
(b) 

1+ d 

oq o—o o—o o—o o—o o—o o- O 

w @ 6 4 2 o 2 4 6 8 10 


Figure 3.10 Impulse response of (a) channel response, (b) equalizer response, and (c) cascade of 
channel and equalizer. 


To derive an equation for W,(z) when the channel noise is nonzero, we use Eq. (3.93). 
We note that 
x(n) =h, * s(n) + v(n) (3.120) 


and 
d(n) = s(n) (3.121) 


where h,, is the impulse response of the channel, H(z). 
Noting that s(n) and v(m) are uncorrelated and using the results of Section 2.4.4, we 
obtain, from Eq. (3.120) 


Pa) = ®,,(z)/H(z)|? + ®,, (2) (3.122) 
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Also, from Eqs. (3.120) and (3.121), we may note that x(n) is the output of a system 
with input s(n) and impulse response h,„, plus an uncorrelated noise, v(m). Noting these 
and the fact that all the processes and system parameters are real-valued, we obtain 


Daz) = ©, (2) = HDD (2) (3.123) 
Note that the above result is independent of v(m). Also, with |z| = 1, we may write 
a(z) = H* (z)® (z) (3.124) 
Using Eqs. (3.122) and (3.124) in Eq. (3.93), we obtain 


H*(z2)® (z) 
P (z)|H(z)|? F ®,,, (2) 


This is the general solution to the equalization problem, when there is no constraint on the 
equalizer length and, also, it may be let to be noncausal. Equation (3.125) includes the 
effects of autocorrelation function of the data, s(n), and the noise, v(n). 

To give an interpretation of Eq. (3.125), we divide the numerator and denominator by 
the first term in the denominator to obtain 


Wi) = (3.125) 


1 1 
W(z) = TO : HO (3.126) 
Pss (2) H C)? 
Next, we replace z by e/”, and define the parameter 
®, (e/”)|H(e/”)/7 
p(eiry & Sale AC" (3.127) 


P, (€) 


We may note that this is the signal-to-noise power spectral density ratio at the channel 
output. ©, (e/”)|H(e/”)|* and ®,,,(e/) are the signal power spectral density and noise 
power spectral density, respectively, at the channel output. Substituting Eq. (3.127) in 
Eq. (3.126) and rearranging, we obtain 


p(e!”) 1 


1+ pei?) H(i) E 


W,(e/”) = 


We note that the frequency response of the optimized equalizer is proportional to the 
inverse of the channel frequency response, with a proportionality constant that is frequency 
dependent. Furthermore, p(e/”) is a nonnegative real quantity, for power spectra are 
nonnegative real functions. Hence, 


pe”) 


a 3.129 
sro (3.129) 


This brings us to the following interpretation of Eq. (3.128). The frequency response of 
the optimum equalizer resembles the channel inverse within a real-valued constant in the 
range of 0 to 1. This constant, which is frequency dependent, depends on the signal-to- 
noise power spectral density ratio, p(e/”), at the equalizer input. It approaches 1 when 
p(e/”) is large, and reduces with p(e/®). 
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Once again, it is important to note that different frequencies are treated independent 
of one another by the equalizer. In particular, at a given frequency œ = w;, W,(e/“) 
depends only on the values of H(e/®) and p(e/) at w = w;. With this background, we 
shall now examine Eq. (3.128) closely to see how the equalizer is able to make a good 
trade-off between cancellation of ISI and noise enhancement. In the frequency regions 
where the noise is almost absent, the value of p(e/“) is very large and hence the equalizer 
approximates the inverse of the channel closely, without any significant enhancement of 
noise. On the other hand, in the frequency regions where the noise level is high (relative 
to the signal level), the value of p(e/”) is not large and hence the equalizer does not 
approximate the channel inverse well. This, of course, is to prevent noise enhancement. 


Example 3.5 


Consider the channel H(z) of Example 3.4. We assume that the data sequence, s(n), is 
binary (taking values of +1 and —1) and white. We also assume that v(m) is a white 
noise process with variance of 0.04. With these, we obtain 


®,.(z)=1 and ®,,,(z) = 0.04 
Using these in Eq. (3.125), we get 


—0.4 + z — 0.42? 


W = 
ok) (—0.4 + z7! — 0.4772) (—0.4 + z — 0.4z2) + 0.04 


(3.130) 


Figure 3.11 presents the plots of 1/| H (e/”)| and |W, (e/”)|. We note that at those frequen- 
cies where 1/|H(e/”)| is small, a near-perfect match between 1/|H(e/®)| and |W,(e/%)| is 
observed. On the other hand, at those frequencies where 1/|H (e/”)| is large, the deviation 


6 1 1 ï ï 
0 0.1 0.2 0.3 0.4 0.5 
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Figure 3.11 Plots of 1/|H(e/”)| and |W,(e/”)|. 
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between the two increases. We may also note that |W,(e/”)| remains less than 1/|H (e/”)|, 
for all values of œ. 

This is consistent with the conclusion drawn previously because a small value of 
1/|H(e/”)| implies that |H(e/”)| is large and thus, according to Eq. (3.127), p(e/”) 
is also large. This, in turn, implies that the ratio p(e/”)/1 + p(e/®) is close to 1; hence, 
from Eq. (3.128), we get 


w (et?) © 


H(ei®) 


Similarly, the same argument may be used to explain why W, (e/®) is significantly smaller 
than 1/|H (e7®)| when the latter is large. Furthermore, the fact that |W, (e7®)| remains less 
than 1/|H(e/®)|, for all values of œ, is predicted by Eq. (3.128). 


3.6.5 Noise Cancellation 


Figure 3.12 depicts a typical noise canceler setup. There are two inputs to this setup: a 
signal source, s(n), and a noise source, v(n). These two signals, which are assumed to 
be uncorrelated with each other, are mixed together through the system functions H (z) 
and G(z), and result in the primary input, d(n), and reference input, x(n), as shown in 
Figure 3.12. The reference input is passed through a Wiener filter W (z), which is designed 
so that the difference between the primary input and the filter output is minimized in the 
mean-square sense. The noise canceler output is the error sequence e(n). The aim of a 
noise canceler setup, as explained above, is to extract the signal s(n) from the primary 
input d (n). 
We note that 
x(n) = v(n) +h, * s(n) (3.131) 


and 
d(n) = s(n) + g, x v(n) (3.132) 


where h, and g, are the impulse responses of the filters H(z) and G(z), respectively. 


s(n d(n + e(n 
(n) D (n) D (n) 


primary 
input = 


x(n) W(2) 


reference 
input 


Figure 3.12 Noise canceler setup. 
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Noting that s(n) and v(m) are uncorrelated with each other and recalling the results of 
Section 2.4.4, we obtain, from Eq. (3.131) 


Pa lz) = (2) + Py ZH C)? (3.133) 


To find ®,,(z), we note that d(n) and x(n) are related with each other through the signal 
sequences s(n) and v(n) and the filters H(z) and G(z). As s(n) and v(n) are uncorrelated 
with each other, their contribution in ®,,(z) may be considered separately. In particular, 
we may write 


Dy, (Z) = Pr) + Oy (2) (3.134) 


where ®*,(z) is By. (z) when v(n) = 0, for all values of n, and ®}, (z) is Py, (z) when 
s(n) = 0, for all values of n. Thus, we obtain 


Di (z) = H*(z)®,,(z) (3.135) 


and 
ax (Z) = G(z)®,,,(z) (3.136) 


Substituting Eqs. (3.135) and (3.136) in Eq. (3.134), we get 
Par) = H*(z)®,,(z) + G(z)®,, (2) 68.137) 
Using Eqs. (3.133) and (3.137) in Eq. (3.93), we obtain 


H*(z)®,,(z) + G@)®,,@) 
Paz) + ®,,(z)|H(z)|? 


A comparison of Eq. (3.138) with Eqs. (3.106) and (3.125) reveals that Eq. (3.138) may 
be thought as a generalization of the results we obtained in the last two sections for the 
modeling and inverse modeling scenarios. In fact, if we refer to Figure 3.12, we can 
easily find that the modeling and inverse modeling scenarios are embedded in the noise 
canceler setup. While trying to minimize the mean-squared value of the output error, one 
must strike a balance between noise cancellation and signal cancellation at the output of 
the noise canceler. Cancellation of the noise v(m) occurs when the Wiener filter W (z) is 
chosen to be close to G(z), and cancellation of the signal s(n) occurs when W (z) is close 
to the inverse of H(z). In this sense, we may note that the noise canceler treats s(n) and 
v(n) without making any distinction between them and tries to cancel both of them as 
much as possible so as to achieve the minimum mean-squared error in e(n). This seems 
contrary to the main goal of the noise canceler, which is meant to cancel only the noise. 
The following discussion aims at revealing some of the peculiar characteristics of the 
noise canceler setup and show under which condition an acceptable cancellation occurs. 

To proceed with our discussion, we define p,;(€/), Prep(e/®), and Poulet?) as the 
signal-to-noise power spectral density ratios at the primary input, reference input, and 
output, respectively. By direct inspection of Figure 3.12 and application of Eq. (2.85), we 
obtain 


W,(z) = (3.138) 


P (e72) 
| G(ej®) E, (ej®) 


Ppi Cag = (3.139) 
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and mer . 
|H (ec?) 7 ®,,(e/”) 


®,,,, (e/”) 


Prep (C7) = (3.140) 


To derive a similar equation for p,,,,(e/”), we note that s(n) reaches the canceler output 
through two routes: one direct and one through the cascade of H(z) and W(z). This gives 


Ds, (et?) = |1 — H(e!”) We!) /?®,, (e/”) (3.141) 


where the superscript s refers to the portion of ®,,(e/”) which comes from s(n). Similarly, 
v(n) reaches the output through the routes G(z) and W(z). Thus, 


P (et) = |G(e!”) — W(e!”)/?®,, (e/”) (3.142) 


Replacing W(e/”) by W, (e2) and using Eq. (3.138) in Eqs. (3.141) and (3.142), we 
obtain . 
[1 — Ge) He!) P03, (6) 


Ss jo 
Peel) = To, @*) + He) 20, (0) P 


Py (e°) (3.143) 


and i f , . 
|H(e/°)|?|1 — G (e2) H (e/®) |? &2, (e9) 


v Jo, _ 
oe [®,, (e/%) + |H(e/%)?®,, (e/%)? 


®,,,(e/”) (3.144) 


respectively. Hence, p,,,(e/°) can now be obtained as 


ps, (e72) P, (e72) 
PaE) = Dete) T AEPS) ii 
Comparing Eq. (3.145) with Eq. (3.140), we find that 
Poul?) = — (3.146) 
Prep (©!) 


This is known as power inversion (Widrow, McCool, and Ball, 1975). It shows that the 
signal-to-noise power spectral density ratio at the noise canceler output is equal to the 
inverse of the signal-to-noise power spectral density ratio at the reference input. This 
means that if the signal-to-noise power spectral density ratio at the reference input is 
low, we should expect a good cancellation of the noise at the output. On the other hand, 
we should expect a poor performance from the canceler when the signal-to-noise power 
spectral density ratio at the reference input is high. This surprising result suggests that 
the noise canceler works better in situations when noise level is high and signal level is 
low. The following example gives a clear picture of this general result. 


Example 3.6 


To demonstrate how the power inversion property of the noise canceler may be utilized 
in practice, we consider a receiver with two omnidirectional (equally sensitive to all 
directions) antennas, A and B, as shown in Figure 3.13. 

A desired signal s(n) = a(n) cosna, arrives in the direction perpendicular to the line 
connecting A and B. An interferer jammer) signal v(n) = B(n) cos næ, arrives at an angle 
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Figure 3.13 A receiver with two omnidirectional antennas. 


0, with respect to the direction of s(n). The amplitudes a(n) and B(n) are narrow-band 
baseband signals. This implies that s(n) and v(m) are narrow-band signals concentrated 
around w = w,. Such signals may be treated as single tones, and, thus a filter with two 
degrees of freedom is sufficient for any linear filtering that may have to be performed 
on them. This is why only a two-tap linear combiner is considered in Figure 3.13. This 
is expected to perform almost as good as any other unconstrained linear (Wiener) filter. 
We also assume that a(n) and (n) are zero-mean and uncorrelated with each other. The 
two omnis are separated by a distance of / meters. The linear combiner coefficients are 
adjusted so that the output error, e(n), is minimized in the mean-square sense. 

The desired signal, s(n), arrives at the same time at both omnis. However, v(n) arrives 
at B first, and arrives at A with a delay 

j=- h (3.147) 
€ 
where c is the propagation speed. To add this to the time index n, it has to be normalized 
by the time step T, which corresponds to one increment of n. This gives 
l sinb, 


ô, = 3.148 
j cT ( ) 


Noting these, in Figure 3.13, we have 


d(n) = a(n) cos nw, + (n) cos[(n — 5,)@,] (3.149) 
x(n) = a(n) cos nw, + B(n) cos nw, (3.150) 
x(n) = a(n) sinnw, + B(n) sinne, (3.151) 


It may be noted that in Eq. (3.149), we have used 6(n) instead of (n — 5,). This, which 
has been done to simplify the following equations, in practice is valid with a very good 
approximation because of the narrow bandwidth of 6(n), which implies that its variation 
in time is slow, and the small size of 4). 
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To find the optimum coefficients of the linear combiner, we shall derive and solve the 
Wiener—Hopf equation governing the linear combiner. We note that, here, 


[| ERM] Elm] 
lea e] (3.152) 


Also, 
E[x?(n)] = E[(a(n) cosnw, + B(n) cosna,)”] (3.153) 


Expanding Eq. (3.153) and recalling that a(n) and (n) are uncorrelated with each other, 
we obtain 
2 mta 1.» 2 
E[x*(n)] = a + z Ca Eleos 2nw,] + og E [cos 2nw,]) (3.154) 
where a and oR are variances of a(n) and f(n), respectively. Also, E[cos2n@,] is 


replaced by its time average.” This is assumed to be zero. Thus, we obtain 


T a 
Efx" (n)] = Ta (3.155) 
Similarly, one can get 
a toh 
E[x"(n)] = ->z (3.156) 
and 
E[x(n)x(n)] = 0 (3.157) 
Substituting these in Eq. (3.152), we have 
oe ae 1 
_ 4a B 0 
R= ae. lo l (3.158) 
It is also straightforward to show that 
_ [Eld)x(n)]] _ 1 oo + of COS Ôo Wo 
Pz E ~ 2 ožsindw, ee 
Using Eqs. (3.158) and (3.159) in the Wiener—Hopf equation Rw, = p, we get 
og + 0% COS 5,0, 
2 2 
eh ooo (3.160) 


op sin 6,@, 
og + ok 


? Strictly speaking the replacement of the time average of the periodic sequence cos 2nw, as E[cos 2nw,] does not 
fit into the conventional definitions of stochastic processes. The sequence cos 2n@, is deterministic and thus it does 
not really make sense to talk about its expectation, which is conventionally defined as an ensemble average. On 
the other hand, this is a reality that in many occasions in adaptive filters (such as our example here), the involved 
signals are deterministic and the time averages are used to evaluate the performance of the filters and/or calculate 
their parameters. This is in this context that we replace statistical expectations by time averages. We may note that 
the problem stated in Example 3.6 could also be put in a more statistical form to prevent the above arguments; 
see Problem P3.30. Here, we have decided not to do this, in order to emphasize on the fact that in practice, time 
averages are used instead of statistical expectations. 
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The optimized output of the receiver is 
e,(n) = d(n) — wi x(n) (3.161) 


where x(n) = [x(n) X(n)]". Using Eqs. (3.149), (3.150), (3.151), and (3.160) in 
Eq. (3.161), we get, after some manipulations, 


cosna@, — cos[(n — 6,)@,] 


e(n) = (ofan) — 0, B(n)) (3.162) 


og + os 


Now, by inspection of Eqs. (3.150) and (3.162), we find that 


2 
o, 
signal-to-noise ratio at the reference input = -< (3.163) 
g 
and 
Oo ORo? o} 
signal-to-noise ratio at the output = => (3.164) 


(Pog ag 
which match the power inversion equation (3.146). 


3.7 Summary and Discussion 


In this chapter, we reviewed a class of optimum linear systems collectively known as 
Wiener filters. We noted that the performance function used in formulating the Wiener 
filters is an elegant choice, which leads to a mathematically tractable problem. We dis- 
cussed the Wiener filters in the context of discrete-time signals and systems, and presented 
different formulations of the Wiener filtering problem. We started with the Wiener filter- 
ing problem for a FIR filter. The case of real-valued signals was dealt with first, and the 
formulation was then extended to the case of complex-valued signals. 

The unconstrained Wiener filters were also discussed in detail. By unconstrained, we 
mean there is no constraint on the duration of the impulse response of the filter. It 
may extend from time n = —oo to n = +00. This study, although nonrealistic in actual 
implementation, turned out to be very instructive in revealing many aspects of the Wiener 
filters, which could not be easily perceived when the duration of the filter impulse response 
is limited. 

The eminent features of the Wiener filters that were observed are as follows: 


For a transversal Wiener filter, the performance function is a quadratic function of its 

tap weights with a single global minimum. The set of tap weights, which minimizes the 

Wiener filter cost function, can be obtained analytically by solving a set of simultaneous 

linear equations known as Wiener—Hopf equation. 

e When the optimum Wiener filter is used, the estimation error is uncorrelated with the 
input samples of the filter. This property of Wiener filters, which is referred to as the 
principle of orthogonality, is useful and handy for many related derivations. 

e Wiener filter can also be viewed as a correlation canceler in the sense that the optimum 

Wiener filter cancels that part of the desired output, which is correlated with its input, 

while generating the estimation error. 


80 Adaptive Filters 


e In the case of unconstrained Wiener filters, the Wiener filter treats different frequency 
components of the underlying processes separately. In particular, the Wiener filter 
transfer function at any particular frequency depends only on the power spectral density 
of the filter input and crosspower spectral density between the filter input and its desired 
output at that frequency. 


The last property, although could only be derived in the case of unconstrained Wiener 
filters, is also approximately valid when the filter length is constrained. The concept of 
power spectra and their influence on the performance of Wiener filters is fundamental 
for understanding the behavior of adaptive filters. We note that the adaptive filters, as 
commonly implemented, are aimed at implementing Wiener filters. In this chapter, we 
observed that the optimum coefficients of the Wiener filter are a function of the auto- 
correlation function of the filter input and the crosscorrelation function between the filter 
input and its desired output. As correlation functions and power spectra are uniquely 
related, we also observed that the optimum coefficients can be expressed in terms of the 
corresponding power spectra instead of correlation functions. In the next few chapters, we 
will show that the convergence behavior of adaptive filters is closely related to the power 
spectrum of their inputs. In the rest of this book, we will be making frequent references 
to the results derived in this chapter. 


Problems 


P3.1 Consider a two-tap Wiener filter with the following statistics: 


te. Tio fi 
Bie = 2, BS lay al ete 


(i) Use the above information to obtain an expression for the performance func- 
tion of the filter. 

(ii) By letting the derivatives of the performance function with respect to the 
involved variables equal to zero, obtain the optimum values of the filter tap 
weights. 

(iii) Insert the result obtained in (ii) in the performance function expression to 
obtain the minimum mean-squared error of the filter. 

(iv) Find the optimum tap weights of the filter and its minimum mean-squared 
error using the equations derived in this chapter to confirm the results 
obtained in (ii) and (iii). 


P3.2 Consider a three-tap Wiener filter with the following statistics: 


1 0.5 0.25 3 
E{d@(n)|=10, R=|05 1 05], p=|1 
025 04 1 0 


Repeat Steps (i) through (iv) of Problem P3.1. 


P3.3 In Section 3.2, the Wiener—Hopf and minimum mean-squared error equations 
were derived for a transversal filter. Derive similar equations when the transversal 
filter is replaced by a linear combiner similar to the one presented in Figure 1.3 
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unit variance 
white process 


Figure P3.4 


P3.4 Consider the modeling problem shown in Figure P3.4. For a = 0, 


(i) find the correlation matrix R of the filter tap inputs and the crosscorrelation 
vector p between the filter tap inputs and its desired output. 

(ii) find the optimum tap weights of the Wiener filter and the minimum mean- 
squared error at the filter output. 


P3.5 Repeat Problem P3.4 when a = 1. 
P3.6 Repeat Problems P3.4 and P3.5 when Figure P3.4 is replaced by Figure P3.6. 


d(n) 
eat T 
unit variance = An) e(n) s 
white process Wo + Wiz = QD 


Figure P3.6 


P3.7 Consider the modeling problem shown in Figure P3.7. 


(i) Find the correlation matrix R of the filter tap inputs and the crosscorrelation 
vector p between the filter tap inputs and its desired output. 
(ii) Find the optimum tap weights of the Wiener filter. 
(iii) What is the minimum mean-squared error? Obtain this analytically as well 
as by direct inspection of Figure P3.7. 
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y(n) 
(02 = 0.1) 
p 
d(n) 
4t 
unit variance ) e(n) 
A 
white process > 


Figure P3.7 


P3.8 Consider the channel equalization problem shown in Figure P3.8. The data sym- 
bols, s(n), are assumed to be samples of a stationary white process. 


(i) Find the correlation matrix R of the equalizer tap inputs and the crosscorre- 
lation vector p between the equalizer tap inputs and the desired output. 
(ii) Find the optimum tap weights of the equalizer. 
(iii) What is the minimum mean-squared error at the equalizer output. 
(iv) Could you guess the results obtained in (ii) and (iii) without going through 
the derivations? How and why? 


channel equalizer 


Figure P3.8 


P3.9 In Section 3.5, we emphasized that for a complex variable w 
VE f(w) =0 (P3.9.1) 


does not imply that 


afw) _ afw) _ (P3.9.2) 
JWR Ow, E 


in general. In this problem, we want to elaborate on this further. 


(i) Assume that f (w) = w” and show that for this function Eq. (P3.9.1) is true, 
but Eq. (P3.9.2) is false. 
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P3.10 
P3.11 


P3.12 


P3.13 


Can you extend this result to the case when 


E 
fw) = J au 
i=0 


and a,’s are fixed real or complex coefficients? 
(ii) What is your answer to the case when f(w) = w*w”? The asterisk denotes 
complex conjugation. 


Workout the details of the derivation of Eq. (3.74). 


In Section 3.5, for the complex-valued signals, we used the principle of orthogo- 
nality to derive the Wiener—Hopf equation and the minimum mean-squared error 
(3.74). Starting with the definition of the performance function, derive an equation 
similar to Eq. (3.12) for the case of complex-valued signals. Use this equation to 
give a direct derivation for the Wiener—Hopf equation, in the present case. Also, 
confirm the minimum mean-squared error equation (3.74). 


Show that for a Wiener filter with complex-valued tap-input vector x(n) and 
optimum tap-weight vector w, 


wip = El|w'x(n)|7] 


where p = E[d(n)x*(n)] and d(n) is the desired output of the filter. Use this 
result to argue that wp is always positive. Also, use the above result to derive 
an equation similar to Eq. (3.50), for the general case of Wiener filters with 
complex-valued signals. 


Consider the channel equalization problem depicted in Figure P3.13. Assume that 
the underlying processes are real-valued with 


1, k=0 
E[s(n)s(n — k)] = fi k £0 
and 
o2, k=0 
E[v(n)v(n — k)] = e k £0 


y(n) 
| y 
s(n) 1 x(n) y(n) e(n) 
eA > —1 —2 
aioe paa PY] wo tne + ae TD 
channel equalizer 


Figure P3.13 
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(i) For ø? =0 , obtain the equalizer tap weights by direct solution of the 
Wiener—Hopf equation. To be sure of your results, you may also guess 
the equalizer tap weights and compare them with the calculated ones. 

(ii) Find the equalizer tap weights when o? = 0.1 and compare the results with 
what you have obtained in (i). 

(iii) Plot the magnitude and phase responses of the two designs obtained above 
and compare the results. 


P3.14 Consider the equalization setup in Figure P3.14. Assume that the input data s(n) 
is binary and takes equally probable values of +1. Also, assume that the data 
symbols, s(m) and s(n), for m Æ n are uncorrelated with one another. A is a 
delay that should be chosen to optimize the performance of the equalizer. 

Find the equalizer tap weights wọ through w- and the minimum mean-squared 
error, Emin» at the equalizer output for the choices of A = 0, 1, 2,..., 7. Discuss 
your observation on how &,,;, varies with A. 

Note: You may use MATLAB or any other numerical software to complete your 
answer to this problem. 


Channel Equalizer 


Figure P3.14 


P3.15 Repeat Problem P3.14 when the channel transfer function is replaced by 1 + 
2g} +277. 


P3.16 Consider the system modeling problem of Figure P3.16. Assume that x(n) and 
vo(n) are both white processes and o? = 1 and of = 0.01. 


(i) Find an expression for the performance function £. 
(ii) Present a plot of € versus w for values of w in the range of —1 to +1 and 
show that it has a minimum at w = 0.5. 


P3.17 Consider the system modeling problem of Figure P3.17. Assume that x(n) and 
v,(n) are both white processes and o? = 1 and a; = 0.001. 


(i) Find an expression for the performance function £ in terms of the parameters 
Wo and wy. 
(ii) Present a mesh plot of Ẹ versus the parameters wọ and w). 
(iii) Show that at the point where (wo, w,) = (0.5, —0.5), E(wo, wy) = or: Is 
there any other choice of (wọ, w,) for which also &(wo, w,) = 02. Explain 
your answer. 
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P3.18 


P3.19 


P3.20 


P3.21 


Figure P3.16 


Figure P3.17 


Recalling that in any Wiener filter 
Emin = Elld(n)|"] — Ellyo(n)I7] 


prove Eq. (3.97) by first deriving an expression for E[|y,(n)|7] and then replacing 
the result in the above equation. 


By following a procedure similar to the one given in Section 3.6.1, show that 
when the involved processes and system parameters are complex-valued 


E = baa (0) + yy (0) — 2R{byq(O)} 


where R{x} denotes the real part of x. Proceed with this result to develop the 
dual of equation (3.78). 


Show that Eq. (3.93) is a valid result even when the involved processes are 
complex-valued. 


Consider Figure P3.21, in which x;(n) and d;(n) are the outputs of two similar 
narrow-band filters centered at œ = @;, as in Figure 3.7. Show that if w,, is 
the optimum value of w; which minimizes the mean-squared of the output error, 
e;(n), then 
~ Par (el) 
io X Pa (eiei) 
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Figure P3.21 


P3.22 Assuming that W,(z) is the optimum system function of a FIR filter, show that 
Eq. (3.96) can be converted to Eq. (3.74) and vice versa. 


P3.23 Give a detailed derivation of Eq. (3.122) from Eq. (3.120). 
P3.24 Give a detailed derivation of Eq. (3.123). 


P3.25 For the noise canceler setup shown in Figure 3.12, consider the case when 
®,.(z)|H D K ®,,,,(z). Recall that the term signal is used to refer to s(n) 
and the term noise is used to refer to v(n). 


(i) Show that, in this case, 
®,, (2) 
P, (z) 


(ii) Show that the power spectral density of the noise reaching the noise canceler 
output is 


W,(z) © G(z) + Ho (2) 


® output noise 2) = P (2) Pret (Z) Ppi (ZIG (2) 


(iii) Define the signal distortion at the canceler output, D(z), as the ratio of the 
power spectral density of the signal propagating through W,(z) to the output 
to the power spectral density of the signal at the primary input. Show that 


D(z) © |H(Z)G(Z) + Pree (ZI 
(iv) Show that the result obtained in (iii) may be written as 


Prot (Z) 


D xX 
@) Ppri (z) 


when Pretz) < |H(z)| IGC). 
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d(n) T 
primary 
input = 


x(n) 


W(z) 


reference 
input 


Figure P3.26 


P3.26 Consider the noise canceler setup shown in Figure P3.26. 


(i) Derive an unconstrained Wiener filter W,(z). 
(ii) Show that the power inversion formula (3.146) is also valid for this setup. 


P3.27 Consider an array of three omnidirectional antennas, as shown in Figure P3.27. 
The signal, s(n), and jammer, v(n), are narrow-band processes, as in Example 
3.6. To cancel the jammer, we use a two-tap filter, similar to the one used in 
Figure 3.13, at either of the points 1 or 2, in Figure P3.27. 


(i) To maximize the cancellation of the jammer, where will you place the two-tap 
filter? 


Figure P3.27 
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(ii) For your choice in (i), find the optimum values of the filter tap weights. 
(iii) Find an expression for the signal and jammer components reaching the can- 
celer output, and confirm the power inversion formula. 


P3.28 Consider an array of three omnidirectional antennas as shown in Figure P3.28. The 
signal, s(n), and jammer, v(n), are narrow-band processes, as in Example 3.6. 


(i) Find the optimum values of the filter tap weights, which minimize the mean- 
squared of the output error, e(n). 

(ii) Find an expression for the canceler output, and confirm the power inversion 
formula. 


WI 


Figure P3.28 


P3.29 Repeat P3.28 for the array shown in Figure P3.29, and compare the results 
obtained with those of P3.28. 


P3.30 To prevent time averages and derive the results presented in Example 3.6 through 
ensemble averages, the desired signal and jammer may be redefined as s(n) = 
a(n) cos(n@, + 91) and v(n) = B(n) cos(nw, + p2), respectively, where g; and 
Q are random initial phases of the carrier, and assumed to be uniformly distributed 
in the interval —z to +z. The amplitudes w(n) and B(n), as in Example 3.6, are 
uncorrelated narrow-band baseband signals. Furthermore, the random phases 9, 
and p, are assumed to be independent among themselves as well as with respect 
to a(n) and f(n). 


(i) Using the new definitions of s(n) and v(n), show that the same result as in 
Eq. (3.160) is also obtained through ensemble averages. 
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Figure P3.29 


(ii) Show that, for the present case 


oj0:(n) 
e(n) = 7 2 [cos(n@, + gı) = cos((n = ONOM + ¢)] 
Ox Op 
2 
re [cos(nw, + p2) — cos((n — ê Jw, + p2)] 


(iii) Use the result in (ii) to verify the power inversion formula, in the present 
case. 


4 


Eigenanalysis and Performance 
Surface 


The transversal Wiener filter was introduced in the last chapter as a powerful signal pro- 
cessing structure with a unique performance function, which has many desirable features 
for adaptive filtering applications. In particular, it was noted that the performance func- 
tion of the transversal Wiener filter has a unique global minimum point, which can be 
easily obtained using the second-order moments of the underlying processes. This is a 
consequence of the fact that the performance function of the transversal Wiener filter is 
a convex quadratic function of its tap weights. 

Our goal in this chapter is to analyze the quadratic performance function of the transver- 
sal Wiener filter in detail. We get a clear picture of the shape of the performance function 
when it is visualized as a surface in the (N + 1)-dimensional space of variables consist- 
ing of the filter tap weights, as the first N axes, and the performance function, as the 
(N + 1)th axis. This is called performance surface. 

The shape of the performance surface of a transversal Wiener filter is closely related to 
the eigenvalues of the correlation matrix R of the filter tap inputs. Hence, we start with 
a thorough discussion on the eigenvalues and eigenvectors of the correlation matrix R. 


4.1 Eigenvalues and Eigenvectors 


Let 
R= E[x(n)x4(n)] (4.1) 


be the N-by-N correlation matrix of a complex-valued wide-sense stationary 
stochastic process represented by the WN-by-1 observation vector x(n) = [x(n) 
x(n — 1)---x(n —N+1)]", where the superscripts H and T denote Hermitian and 
transpose, respectively. 

A nonzero N-by-1 vector q is said to be an eigenvector of R, if it satisfies the equation 


Rq = Aq (4.2) 


for some scalar constant à. The scalar A is called the eigenvalue of R associated with the 
eigenvector q. We note that if q is an eigenvector of R, then for any nonzero scalar a, 
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aq is also an eigenvector of R, corresponding to the same eigenvalue, à. This is easily 
verified by multiplying Eq. (4.2) through by a. 
To find the eigenvalues and eigenvectors of R, we note that Eq. (4.2) may be rear- 
ranged as 
(R —ADq =0 (4.3) 


where I is the N-by-N identity matrix, and 0 is the N-by-1 null vector. To prevent the 
trivial solution q = 0, the matrix R — AI has to be singular. This implies 


det(R — AI) = 0 (4.4) 


where det(-) denotes determinant. Equation (4.4) is called the characteristic equation 
of the matrix R. The characteristic equation (4.4), when expanded, is an Nth order 
equation in the unknown parameter à. The roots of this equation, which may be called 
Ap, Aj,+++»Aw_ 1, are the eigenvalues of R. When A,s are distinct, R—A,I, for i = 0, 
1,..., N — 1 will be of rank N — 1. This leads to N eigenvectors qo, q,,---,Qy_1, for 
the matrix R, which are unique up to a scale factor. On the other hand, when the character- 
istic equation (4.4) has repeated roots, the matrix R is said to have degenerate eigenvalues. 
In that case, the eigenvectors of R will not be unique. For example, if 4,, is an eigen- 
value of R repeated p times, then the rank of R—A,,I is N — p, and thus the solution 
of the equation (R — à„ Dq, = 0 can be any vector in a p-dimensional subspace of the 
N-dimensional complex vector space. This, in general, creates some confusion in eigen- 
analysis of matrices, which should be handled carefully. To prevent such confusions, in 
the discussion that follows, wherever necessary, we start with the case that the eigenvalues 
of R are distinct. The results will then be extended to the case of repeated eigenvalues. 


4.2 Properties of Eigenvalues and Eigenvectors 


We discuss the various properties of the eigenvalues and eigenvectors of the correlation 
matrix R. Some of the properties derived here are directly related to the fact that the 
correlation matrix R is Hermitian and nonnegative definite. A matrix A, in general, is 
said to be Hermitian if A = A". This for the correlation matrix R is observed by direct 
inspection of Eq. (4.1). The N-by-N Hermitian matrix A is said to be nonnegative definite 
or positive semidefinite, if 

viay > 0 (4.5) 


for any N-by-1 vector v. The fact that A is Hermitian implies that v“Av is real-valued. 
This can be seen easily, if we note that with the dimensions specified above v'Av is a 
scalar and (v Av)* = (v! Av)" = vHAvy. For the correlation matrix R, to show that v” Rv 
can never be negative, we replace for R from Eq. (4.1) to obtain 


veRv = vVEE[x(n)x#(n) |v 
= E[v"x(n)x#(n)v] (4.6) 


We note that v"x(n) and x#(n)v constitute a pair of complex-conjugate scalars. This, 
when used in Eq. (4.6), gives 


viRy = E[|v'x(n)|7] (4.7) 


which is nonnegative for any vector v. 
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From Eq. (4.7), we note that when v is nonzero, the Hermitian form v4Rv may be 
zero only when there is a consistent dependency between the elements of the observation 
vector x(n), so that v4x(n) = 0, for all observations of x(n). For a random process 
{x(n)}, this can only happen when {x(n)} consists of a sum of L sinusoids with L < N. 
In practice, we find that this situation is very rare and thus for any nonzero v, v'Rv 
is almost always positive. We thus say that the correlation matrix R is almost always 
positive definite. 

With this background, we are now prepared to discuss the properties of the eigenvalues 
and eigenvectors of the correlation matrix R. 


Property 1: The eigenvalues of the correlation matrix R are all real and nonnegative. 


Consider an eigenvector q; of R and its corresponding eigenvalue 4;. These two are 
related together according to the equation 


Rq; = 4,4; (4.8) 
Premultiplying Eq. (4.8) by qi! and noting that A; is a scalar, we get 
qi'Rq; = 4;4/'q; (4.9) 


The quantity qřq; on the right-hand side is always real and positive, because it is the 
squared length of the vector q;. Furthermore, the Hermitian form q”Rq; on the left-hand 
side of Eq. (4.9) is always real and nonnegative, because the correlation matrix R is 
nonnegative definite. Noting these, it follows from Eq. (4.9) that 


à; 2> 0, for i = 0,1,...,N—1 (4.10) 


Property 2: If q; and q; are two eigenvectors of the correlation matrix R that correspond 
to two of its distinct eigenvalues, then 


qq; =0 (4.11) 


In other words, eigenvectors associated with the distinct eigenvalues of the correlation 
matrix R are mutually orthogonal. 


Let A; and À; be the distinct eigenvalues corresponding to the eigenvectors q; and q,, 
respectively. We have 
Rq; = 4,4; (4.12) 


and 
Rq; = 4,4; (4.13) 


Applying conjugate transpose on both sides of Eq. (4.12) and noting that A; is a real 
scalar and for the Hermitian matrix R, R" = R, we obtain 


qi R = A,qi! (4.14) 


Eigenanalysis and Performance Surface 93 


Premultiplying Eq. (4.13) by që, postmultiplying Eq. (4.14) by qj, and subtracting the 
two resulting equations, gives 


(A; — A,)a;'q; = 0 (4.15) 
Noting that A; and A; are distinct, this gives Eq. (4.11). 
Property 3: Let qo, q;,.--.Qy_, be the eigenvectors associated with the distinct eigen- 
values hg, Ay,...,Ay— 1 Of the N-by-N correlation matrix R, respectively. Assume the 
eigenvectors Qo, Q,,---,Qy_, are all normalized to have a length of unity, and define the 
N-by-N matrix 
Q = [qo qi ++ qy]. (4.16) 
Q is then a unitary matrix, i.e., 
QQ =I (4.17) 


This implies that the matrices Q and Q! are the inverse of each other. 


To show this property, we note that the ijth element of the N-by-N matrix QĦQ is the 
product of the ith row of QĦ, which is qi", and the jth column of Q, which is q;. That is, 


the ijth element of Q"Q = qřq;. (4.18) 


Noting this, Eq. (4.17) follows immediately from Property 2. 

In cases where the correlation matrix R has one or more repeated eigenvalues, as was 
noted earlier, attached to each of these repeated eigenvalues there is a subspace of the 
same dimension as the multiplicity of the eigenvalue in which any vector is an eigenvector 
of R. From Property 2, we can say that the subspaces that belong to distinct eigenvalues 
are orthogonal. Moreover, within each subspace one can always find a set of orthogonal 
basis vectors that span the whole subspace. Clearly, such a set is not unique, but can 
always be chosen. This means, for any repeated eigenvalue with multiplicity p, one can 
always find a set of p orthogonal eigenvectors. Noting this, we can say, in general, that 
for any N-by-N correlation matrix R, one can always make a unitary matrix Q whose 
columns are made up of a set of eigenvectors of R. 


Property 4: For any N-by-N correlation matrix R, one can always find a set of mutually 
orthogonal eigenvectors. Such a set may be used as a basis to express any vector in the 
N-dimensional space of complex vectors. 

This property follows from the earlier discussion. 


Property 5: Unitary Similarity Transformation. The correlation matrix R can always 
be decomposed as 

R = QAQË (4.19) 
where the matrix Q is made up of a set of unit-length orthogonal eigenvectors of R as 
specified in Eqs. (4.16) and (4.17), 

ig Üe O 


aO 0 
A=]... (4.20) 
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and the order of the eigenvalues Xo, 44,...,%y_, matches that of the corresponding 
eigenvectors in the columns of Q. 


To prove this property, we note that the set of equations 
Rq; =4,;q;, for i=0,1,...,N—1 (4.21) 
may be packed together as a single matrix equation 
RQ = QA. (4.22) 


Then, postmultiplying Eq. (4.22) by Q! and noting that QQ" = I, we can get Eq. (4.19). 
The right-hand side of Eq. (4.19) may be expanded as 


N-1 
R=) aqq (4.23) 
i=0 
Property 6: Let ào, A,,...,Ay_, be the eigenvalues of the correlation matrix R. Then, 
N-1 
t[R]= SoA, (4.24) 
i=0 


where tr[R] denotes trace of R and is defined as the sum of the diagonal elements of R. 


Taking the trace on both sides of Eq. (4.19), we get 
tr[R] = t[QAQ*"] (4.25) 


To proceed, we may use the following result of matrix algebra. If A and B are N-by-M 
and M-by-N matrices, respectively, then, 


tr[AB] = tr[BA] (4.26) 


Using this result, we may swap QA and Q! on the right-hand side of Eq. (4.25). Then, 
noting that QĦQ = I, Eq. (4.25) is simplified as 


tr[R] = tr[ A] (4.27) 
Using the definition (4.20) in Eq. (4.27) completes the proof. 
An alternative way of proving the above result is by direct expansion of Eq. (4.4); see 


Problem P4.8. This proof shows that the identity (4.24) is not limited to the Hermitian 
matrices. It applies to any square matrix. 


Property 7: Minimax Theorem.! The distinct eigenvalues àg > 4, > «+: > Ay_, Of the 
correlation matrix R of an observation vector x(n), and their corresponding eigenvectors, 


qdo; q1; --- Gy_—1, May be obtained through the following optimization procedure: 
Amax = Ao = max, Ellqgx(n)!7] (4.28) 
oll= 


l! Īn matrix algebra literature, the minimax theorem is usually stated using the Hermitian form q” Rq; instead of 
E [ql x(n) |7] (Haykin, 1991). The method that we have adopted here is to simplify some of our discussions in the 
following chapters. This method has been adopted from Farhang-Boroujeny and Gazor (1992). 
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and fori = 1,2,...,N— 1 


4, = max, E{iq?x(n)|7] (4.29) 
q; l= 
with 
q'q;=0, for 02j<i (4.30) 


where ||q;|| = , /qiq; denotes the length or norm of the complex vector q,. 
Alternatively, the following procedure may also be used to obtain the eigenvalues of the 
correlation matrix R, in the ascending order: 


Amin = Ày- = min Ellqy_.x()|"] (4.31) 
llqn—1]=1 


and for i = N —2,...,1,0 
A = min, E{\q@x(n)|"] (4.32) 
with 

q'q,;=0, fori<j<N-1. (4.33) 


Let us assume that the set of vectors that satisfy the minimax optimization procedure are 
the unit-length vectors po, P1, ---, Py_— 1. From Property 4, we recall that the eigenvectors 
Qo. G1,--->4y_ 1 are a set of basis vectors for the N-dimensional complex vector space. 
This implies that, we may write 


N-1 
p=) aaj, fori=0,1,...,.N-1 (4.34) 
j=0 


where the complex-valued coefficients a,;s are the coordinates of the complex vectors po, 
P|.---»Py_; in the N-dimensional space spanned by the basis vectors qg, q1; ---, Qy_1- 
Let po be the unit-length complex vector, which maximizes E [Ipix(n)|7 1. We note that 


E[|pox(n)|7] = Elpx(n)x"(n)po] 


= pp E[x(n)x"(n) Ippo 
= po Rpo (4.35) 
Substituting Eq. (4.23) in Eq. (4.35), we obtain 
N-1 
E[|pox(n)|"1 = >> A;po'a:a' Po (4.36) 
i=0 
Using Property 3, we get 
POG: = oð; (4.37) 


and 
q Po = Aj (4.38) 
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Substituting these in Eq. (4.36), we obtain 


N-1 


E{|pox(n)|7] = > Alao? (4.39) 
i=0 
On the other hand, we may note that, because Ay > A}, Ax, ---,AN_1, 
N-1 N-1 
>; Alagi? < Ao » lao; |? (4.40) 
i=0 i=0 


where the equality holds (i.e., pọ maximizes E [pix only when a; = 0, for i = 
1,2,..., N — 1. Furthermore, the fact that the po is constrained to the length of unity 
implies that 


N-1 
X lool? = 1 (4.41) 
i=0 
Application of Eqs. (4.39) and (4.41) in Eq. (4.40) gives 
max, E[|ppx(n)|"] = Ao (4.42) 
oll= 
and this is achieved when 
Po = % 00 With [œo] = 1 (4.43) 


We may note that the factor ap, is arbitrary and has no significance as it does not affect 
the maximum in Eq. (4.42) because of the constraint |a@g | = 1 in Eq. (4.43). Hence, 
without any loss of generality, we assume py = 1. This gives 


as a solution to the maximization problem 


max E[|px(n)|*1. (4.45) 
|Poll=1 
The fact that the solution obtained here is not unique follows from the more general fact 
that the eigenvector corresponding to an eigenvalue is always arbitrary to the extent of 
a scalar multiplier factor. Here, the scalar multiplier is constrained to have a modulus 
of unity to satisfy the condition that both p; and q; vectors are constrained to the length 
of unity. 
In proceeding to find pı, we note that the constraint (4.30), for i = 1, implies that 


Pido = 0 (4.46) 
This, in turn, requires p; to be limited to a linear combination of q1, qo, ..., Qy_1, only. 
That is, 
N-1 
p=) ojd; (4.47) 


j=l 
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Noting this and following a procedure similar to the one used to find po, we get 


max, Efi ex()I"] = ay (4.48) 
PLl= 


and 
Pi =q; (4.49) 


Following the same procedure for the rest of the eigenvalues and eigenvectors of R 
completes the proof of the first procedure of the minimax theorem. 

The alternative procedure of the minimax theorem, suggested by Eqs. (4.31)—(4.33), can 
also be proved in a similar way. 


Property 8: The eigenvalues of the correlation matrix R of a discrete-time stationary 
stochastic process {x(n)} are bounded by the minimum and maximum values of the power 
spectral density, P, (e7®”), of the process. 


The minimax theorem, as introduced in Property 7, views the eigenvectors of the corre- 
lation matrix of a discrete-time stochastic process as the conjugate of a set of tap-weight 
vectors corresponding to a set of FIR filters that are optimized in the minimax sense 
introduced there. Such filters are conveniently called eigenfilters. The minimax optimiza- 
tion procedure suggests that the eigenfilters may be obtained through a maximization or a 
minimization procedure, which looks at the output powers of the eigenfilters. In particular, 
the maximum and minimum eigenvalues of R may be obtained by solving the following 
two independent problems, respectively: 


Amas = max, El |qox(n)|"] (4.50) 
goll= 
and 
Amin = min Efla x|] (4.51) 
lay- l=1 


Let Q;(z) denote the system function of the ith eigenfilter of the discrete-time stochastic 
process {x(n)}. Using the Parseval’s relation (Eq. (2.26) of Chapter 2), we obtain 


1 f* 
llaill? = qa; = 5— l |O;(e!”) dw (4.52) 
T T 
With the constraint ||q;|| = 1, this gives 


l á joy |2 
ay | lie’) do =1 (4.53) 


On the other hand, if we define x; (n) as the output of the ith eigenfilter of R, that is, 
x;(n) = q; x(n) (4.54) 
then, using the power spectral density relationships provided in Chapter 2, we obtain 


Oy (E) = [Q (e/)/?®,,(e/*). (4.55) 
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We may also recall from the results presented in Chapter 2 that 
1 f” 
EIOP = 5 f &yy(e!*)de. (4.56) 
2r -x tt 


Substituting Eq. (4.55) in Eq. (4.56), we obtain 
T 


1 ; l 
Elx] = =l Q; (i2 E a edo. (4.57) 


This result has the following interpretation. The signal power at the output of the ith 
eigenfilter of the correlation matrix R of a stochastic process {x(n)} is given by a weighted 
average of the power spectral density of {x(n)}. The weighting function used for averaging 
is the squared magnitude response of the corresponding eigenfilter. 

Using the above results, Eq. (4.50) may be written as 


x 
Amax = Max ~ l |Oo(e/”) |? ®,, (edo (4.58) 
-r 
subject to the constraint P 
= | lQ@”)? = 1 (4.59) 
-r 
We may also note that 
l 7 jo 2 jo max 1 7 jæyı2 
az | edos om f odo 460 
where , 
out >. max (e7) (4.61) 


=R SOEN 
With the constraint (4.59), Eq. (4.60) simplifies to 
1 m i . 
= J |Oo(e/®) |? ®,, leido < om™ (4.62) 
=T 
Using Eq. (4.62) in Eq. (4.58), we obtain 
Amas < Da (4.63) 


Following a similar procedure, we may also find that 


Anas om“ (4.64) 
where l 
min & min ©,,(e/”) (4.65) 
=H LUER 


Property 9: Let x(n) be an observation vector with the correlation matrix R. Assume that 
qo; q1; ---, qu—ı are a set of orthogonal eigenvectors of R and the matrix Q is defined 
as in Eq. (4.16). Then, the elements of the vector 


x(n) = QUx(n) (4.66) 
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constitute a set of uncorrelated random variables. The transformation defined by Eq. (4.66) 
is called Karhunen—Loéve transform. 


Using Eq. (4.66), we obtain 


Elx’ (n)x""(n)] = QUE[x(n)x"(n)]Q = QURQ. (4.67) 


Substituting for R from Eq. (4.19) and assuming that the eigenvectors qo, q,,---,Qy_1 
are normalized to the length of unity”, so that QĦQ = I, we obtain 


E mx m] = A (4.68) 


Noting that A is a diagonal matrix, this clearly shows that the elements of x'(n) are 
uncorrelated with one another. 
It is worth noting that the ith element of x’(n) is the output of the ith eigenfilter of the 
correlation matrix of the process {x(n)}, that is, the variable x; (n) as defined by Eq. (4.54). 
Thus, an alternative way of stating Property 9 is to say that the eigenfilters associated 
with a process x(n) may be selected so that their output samples, at any time instant n, 
constitute a set of mutually orthogonal random variables. 
It may also be noted that by premultiplying Eq. (4.66) with Q and using QQ" =I, we 
obtain 

x(n) = Qx’(n) (4.69) 


Replacing x’(n) by the column vector [xp(m) x10) => xy)", and expanding 
Eq. (4.69) in terms of the elements of x'(n) and columns of Q, we get 


N-1 
x(n) = È x; (n)q;. (4.70) 
i=0 


This is known as Karhunen—Loéve expansion. 


Example 4.1 


Consider a stationary random process {x(n)} that is generated by passing a real-valued 
stationary zero-mean unit-variance white noise process {v(n)} through a system with the 
system function 

1 — œ? 


H(z) = — (4.71) 


] — œz 


where œ is a real-valued constant in the range of —1 to +1. We want to verify some of 
the results developed above for the process {x(n)}. 
We note that for the unit-variance white noise process {v(n)} 


®,,(z) = 1. 


? This is not necessary for the above property to hold. However, it is a useful assumption as it simplifies our 
discussion. 
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Also, using Eq. (2.80) of Chapter 2 and noting that «œ is real-valued, we obtain 


=g? 
® = H(z) H(z7!)® = . 4.72 
x (2) QH Pa) EET ET. (4.72) 
Taking inverse z-transform, we get 
a(k) =al, for k=...—2,—1,0,1,2,... (4.73) 


Using this result, we find that the correlation matrix of an N-tap transversal filter with 
input {x(n)} is 


1 a a2 «ee aN 
a 1 a ++) aN 
R= ; . i . . ; (4.74) 
aN- gN-2 gN- 1 


Next, we present some numerical results, which demonstrate the relationships between 
the power spectral density of the process {x(n)}, ®,,(e/”), and its corresponding 
correlation matrix. 

Figure 4.1 shows a set of the plots of ®,,.(e/) for values of a = 0, 0.5, and 0.75. We 
note that œ = 0 corresponds to the case where {x(n)} is white and, therefore, its power 
spectral density is flat. As œ increases from 0 to 1, {x(n)} becomes more colored and for 
values of a close to 1, most of its energy is concentrated around w = 0. 

From Property 8, we recall that the eigenvalues of the correlation matrix R are bounded 
by the minimum and maximum values of ®,,, (e/”). To illustrate this, in Figure 4.2a, b, and 
c, we have plotted the minimum and maximum eigenvalues of R for values of a = 0.5, 
0.75, and 0.9, as N varies from 2 to 20. It may be noted that the limits predicted by the 
minimum and maximum values of ®,, (e/”) are achieved asymptotically as N increases. 
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Figure 4.1 Power spectral density of {x(n)} for different values of the parameter a. 
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Figure 4.2 Minimum and maximum eigenvalues of the correlation matrix for different values of 
the parameter a: (a) a = 0.5, (b) a = 0.75, and (c) a = 0.9. 
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Figure 4.2 Continued. 


However, for values of œ close to 1, such limits are approached only when N is very 
large. This may be explained using the concept of eigenfilters. We note that when a is 
close to 1, the peak of the power spectral density function ®,,(e/) is very narrow; see 
the case of œ = 0.75 in Figure 4.1. To pick up this peak accurately, an eigenfilter with 
a very narrow pass-band (i.e., high selectivity) is required. On the other hand, a narrow 
band filter can be realized only if the filter length, N, is selected long enough. 


Example 4.2 


Consider the case where the input process, {x(n)}, to an N-tap transversal filter consists 
of the summation of a zero-mean white noise process, {v(n)}, and a complex sinusoid, 
{e/(@o"+®)) where @ is an initial random phase, which varies for different realizations of 
the process. The correlation matrix of {x(n)} is 


1 e/@o aia el (N—1)@o 
e- iM 1 L.. ej N-2v 
R=0 I+ . l ! : (4.75) 
e7 N-D e-i N-Do .., 1 


where the first term on the right-hand side is the correlation matrix of the white noise 
process and the second term is that of the sinusoidal process. We are interested in finding 
the eigenvalues and eigenvectors of R. These are conveniently obtained through the 
minimax theorem and the concept of eigenfilters. 
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Power spectral density 


Figure 4.3 Power spectral density of the process {x(n)} consisting of a white noise plus a single 
tone sinusoidal signal. 


Figure 4.3 shows the power spectral density of the process {x(n)}. It consists of a flat 
level, which is contributed by {v(m)} and an impulse at œ = w, due to the sinusoidal part 
of {x(n)}. The eigenfilter, which picks up maximum energy of the input, is the one that is 
matched to the sinusoidal part of the input. The coefficients of this filter are the elements 
of the eigenfilter i 

= =j, e7j(N-1)wo]T 
qo JN" e ë ] (4.76) 
The factor 1 / VN in Eq. (4.76) is to normalize Qo to the length of unity. The vector 
qo can easily be confirmed to be an eigenvector of R by evaluating Rq p and noting that 
this gives 
Rg = (0; +N) qo (4.77) 


This also shows that the eigenvalue corresponding to the eigenvector qo is 
Ag = OZ +N (4.78) 


Also, from the minimax theorem, we note that the rest of eigenvectors of R have to be 
orthogonal to qo, that is, 


qq, =0, fori=1,2,...,.N—1 (4.79) 
Using this, it is not difficult (Problem P4.7) to show that 

Rq; =o07q,, fori=1,2,...,N—1 (4.80) 
This result shows that as long as Eq. (4.79) holds, the eigenvectors qi, qo,..-,Qy_1 


of R are arbitrary. In other words, any set of vectors, which belongs to the subspace 

orthogonal to the eigenvector qj, makes an acceptable set for the rest of the eigenvectors 

of R. Furthermore, the eigenvalues corresponding to these eigenvectors are all equal 
2 

to of. 
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4.3 Performance Surface 


With the background developed so far, we are now ready to proceed with exploring the 
performance surface of transversal Wiener filters. We start with the case where the filter 
coefficients, input, and desired output are real-valued. The results will then be extended 
to the complex-valued case. 

We recall from Chapter 3 that the performance function of a transversal Wiener filter 
with a real-valued input sequence x(n) and a desired output sequence d(n) is 


£ = w Rw — 2p'w¢ Eld?(n)] (4.81) 


where the superscript T denotes vector or matrix transpose, w= [wọ w- wy] 
is the filter tap-weight vector, R = E[x(n)x!(n)] is the correlation matrix of the filter 
tap-input vector x(n) = [x(n) x(n — 1)---x(n— N + 1)]", and p = E[d(n)x(n)] is the 
cross-correlation vector between d(n) and x(n). We want to study the shape of the per- 
formance function € when it is viewed as a surface in the (N + 1)-dimensional Euclidean 
space constituted by the filter tap weights w;, i = 0, 1,..., N — 1, and the performance 
function, £. 
Also, we recall that the optimum value of the Wiener filter tap-weight vector is obtained 
from the Wiener-Hopf equation 
Rw, =p (4.82) 


The performance function £ may be rearranged as follows: 
£ = w Rw-w'p- p'wé Eld’ (n)] (4.83) 


where we have noted that w'p = p'w. Next, we substitute for p in Eq. (4.83) from 
Eq. (4.82) and add and subtract the term w! Rw, to obtain 


f= w Rw — w' Rw, — wiR'w + w Rw, + E[d?(n)] — wi Rw, (4.84) 


As RT =R, the first four terms on the right-hand side of Eq. (4.84) can be combined 
to obtain 
E = (w— w,) R(w — w,) + E[d?(n)] — wiRw, (4.85) 


We may also recall from Chapter 3 that 
Emin = Eld’ (n)] — woRw, (4.86) 


where &,;, is the minimum value of €, which is obtained when w = w,. Substituting 
Eq. (4.86) in Eq. (4.85), we get 


E = Emin + (W — Wo) R(w— Wo) (4.87) 


This result has the following interpretation. The nonnegative definiteness of the correlation 
matrix R implies that the second term on the right-hand side of Eq. (4.87) is nonnegative. 
When R is positive definite (a case very likely to happen in practice), the second term on 
the right-hand side of Eq. (4.87) is zero only when w = w,, and in that case € coincides 
with its minimum value. This is depicted in Figure 4.4 where a typical performance 
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Figure 4.4 A typical performance surface of a two-tap transversal filter. 


surface of a two-tap Wiener filter is presented by a set of contours, which correspond to 
different levels of £, and 
Ena St < é aonr 


To proceed further, we define the vector 
viw-w (4.88) 
and substitute it in Eq. (4.87) to obtain 
E = Emin + v'Rv (4.89) 


This simpler form of the performance function in effect is equivalent to shifting the origin 
of the N-dimensional Euclidean space defined by the elements of w to the point w = w,. 
The new Euclidean space has a new set of axes given by vp, v),..., Uy_, (Figure 4.4). 
These are in parallel with the original axes wọ, w4, ..., Wy_;- Obviously, the shape of 
the performance surface is not affected by the shift in the origin. 

To simplify Eq. (4.89) further, we use the unitary similarity transformation, that is, 
Eq. (4.19) of the last section, which, for real-valued signals, is written as 


R=QAQ' (4.90) 
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Substituting Eq. (4.90) in Eq. (4.89), we obtain 


E = Emin + V QAQ" (4.91) 


We define 
v = Qv (4.92) 


and note that multiplication of the vector v by the unitary matrix QT is equivalent to 
rotating the v-axes to a new set of axes given by vp, vj,...,Uy_), as depicted in 
Figure 4.4. The new axes are in the directions specified by the rows of the transfor- 
mation matrix QT. We may further note that the rows of QT are the eigenvectors of 
the correlation matrix R. This means that the v’-axes, defined by Eq. (4.92), are in the 
directions of the basis vectors specified by the eigenvectors of R. 

Substituting Eq. (4.92) in Eq. (4.91), we obtain 


E = Enin + V AV. (4.93) 


This is known as the canonical form of the performance function. Expanding Eq. (4.93) in 
terms of the elements of the vector v’ and the diagonal elements of the matrix A, we get 


N-1 
E = Emin + D> jv? (4.94) 
i=0 
This, when compared with the previous forms of the performance function in Eqs. (4.81) 
and (4.89), is a much easier function to visualize. In particular, if all the variables Up» 
Vis., Uy p except v, are set to zero, 


E = Siin + Ayu (4.95) 


This is a parabola whose minimum occur at v, = 0. The parameter A, determines the 
shape of the parabola, in the sense that for smaller values of à, the resulting parabolas are 
wider (flatter in shape) when compared with those obtained for larger values of à}. This is 
demonstrated in Figure 4.5 where &, as a function of v;, is plotted for a few values of A,. 

When all variables vj, uv}, ..., vy; are varied simultaneously, the performance function 
&, in the (N + 1)-dimensional Euclidean space, is a hyperparabola. The path traced by 
E as one moves along any of the axes vp, Vj,..., U\y_, is a parabola whose shape is 
determined by the corresponding eigenvalue. 

The hyperparabola shape of the performance surface can be best understood in the 
case of a two-tap filter when the performance surface can easily be visualized in the 
three-dimensional Euclidean space whose axes are the two independent taps of the filter 
and the function €; see Figure 3.4 as an example. Alternatively, the contour plots, such 
as those presented in Figure 4.4, may be used to visualize the performance surface in a 
very convenient way. 

For N = 2, the canonical form of the performance function is 


E = Emin Aou Aor (4.96) 


vN? v \? 
(2) +(2) =1 (4.97) 
ag ay 


This may be rearranged as 
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Figure 4.5 The effect of eigenvalues on the shape of the performance function when only one of 
the filter tap weights is varied. 


where 


ay = [_— fas (4.98) 
a, = [=a (4.99) 


Equation (4.97) represents an ellipse whose principal axes are along vj and vj axes, and 
for a; > do, the lengths of its major and minor principal axes are 2a, and 2ag, respectively. 
These are highlighted in Figure 4.6, where a typical plot of the ellipse defined by Eq. 
(4.97) is presented. We may also note that a, / dy = ,/Ag / A,. This implies that for a 
particular performance surface, the aspect ratio of the contour ellipses is fixed and is equal 
to the square root of the ratio of its eigenvalues. In other words, the eccentricity of the 
contour ellipses of a performance surface is determined by the ratio of the eigenvalues 
of the corresponding correlation matrix. A larger ratio of the eigenvalues results in more 
eccentric ellipses and, thus, a narrower bowl-shape performance surface. 


and 
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Figure 4.6 A typical plot of the ellipse defined by Eq. (4.97). 


Example 4.3 


Consider the case where a two-tap transversal Wiener filter is characterized by the fol- 
lowing parameters: 


al 


R=|! rie palih and E[d2(n)] = 2 


We want to explore the performance surface of this filter for values of œ ranging from 0 
to 1. 

The performance function of the filter is obtained by substituting the above parameters 
in Eq. (4.81). This gives 


E = [w w] p i E = 1] H +2 (4.100) 


Solving the Wiener—Hopf equation to obtain the optimum tap weights of the filter, 


we obtain 2g ; 
Wo.0 — = S la 1 _ |The 
josef tem E] aon 
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Using this result, we get 


Enin = E[d?(n)] = wip 


s le 4.102 
E lta l+tal}l] 1l+a a 
Also, 
E = Enin + (W — Wo) R(w — w,) 
2a lallv 
ET + [vo v1] [i i [e (4.103) 


To convert this to its canonical form, we should first find the eigenvalues and eigenvectors 
of R. To find the eigenvalues of R, we should solve the characteristic equation 


à—1 -a 
det(AI — R) = | a eri 0 (4.104) 
Expanding Eq. (4.104), we obtain 
(A-1)?-a7 =0 

which gives 

Ag=ilt+a (4.105) 
and 

A,=1l-a (4.106) 


The eigenvectors qo = [qoo oi]! and qı = [419 41;]' of R are obtained by solving the 


equations 
Ag-1 -a | 
=0 4.107 
| —a Ag — | Be ( ) 


Ay-1 —-a A 
=0 4.108 
| —æ à= | B € ) 
Substituting Eqs. (4.105) and (4.106) in Eqs. (4.107) and (4.108), respectively, we obtain 


qdo =4q4o and qio = —411- 


Using these results and normalizing qo and q; to have lengths of unity, we obtain 


oS] = as 


It may be noted that the eigenvectors qo and q; of R are independent of the parameter a. 
This is an interesting property of the correlation matrices of two-tap transversal filters, 
which implies that the v’-axes are always obtained by a 45° rotation of v-axes. The 
eigenvectors associated with the correlation matrices of three-tap transversal filters also 
have some special form. This is discussed in Problem P4.5. 
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With the above results, we get 
v 1 [f1 1 ]fv 1 Jvg+v | 
|a= 0j=— |0 1 4.109 
M-zl -lalale i 


2a 


l+a 


and 


E= + +a? + —a)v? (4.110) 


Figures 4.7a, b, and c show the contour plots of the performance surface of the two-tap 
transversal filter for a = 0.5, 0.8, and 0.95, which correspond to the eigenvalue ratios of 3, 
9, and 39, respectively. These plots clearly show how the eccentricity of the performance 
surface changes as the eigenvalue ratio of the correlation matrix R increases. 


The above results may be generalized as follows. The performance surface of an N-tap 
transversal filter with real-valued data is a hyperparaboloid in the (N + 1)-dimensional 
Euclidean space whose axes are the N tap-weight variables of the filter and the per- 
formance function £. The performance function may also be represented by a set of 
hyperellipses in the N-dimensional Euclidean space of the filter tap-weight variables. 
Each hyperellipse corresponds to a fixed value of £. The directions of the principal axes 
of the hyperellipses are determined by the eigenvectors of the correlation matrix R. The 
size of the various principal axes of each hyperellipse are proportional to the square root 
of the inverse of the corresponding eigenvalues. Thus, the eccentricity of the hyperellipses 
is determined by the spread of the eigenvalues of the correlation matrix R. This shows 
that the shape of the performance surface of a Wiener FIR filter is directly related to 
the spread of the eigenvalues of R. In addition, from Property 8 of the eigenvalues and 
eigenvectors, we recall that the spread of the eigenvalues of the correlation matrix of a 
stochastic process {x(n)} is directly linked to the variation of the power spectral density 
function ®,,.(e/”) of the process. This, in turn, means there is a close relationship between 
the power spectral density of a random process and the shape of the performance surface 
of an FIR Wiener filter for which the latter is used as input. 

The above results can easily be extended to the case where the filter coefficients, input, 
and desired output are complex-valued. One should only remember that the elements of all 
the involved vectors and matrices are complex-valued and replace all the transpose oper- 
ators in the developed equations by Hermitian operators. Doing this, Eq. (4.93) becomes 


E = Emin HV AV’ (4.111) 
This can be expanded as 
N-1 
E = Emin + 9 Aloi? (4.112) 
i=0 


The difference between this result and its dual (for the real-valued case) in Eq. (4.94) 
is an additional modulus sign on 2s, in Eq. (4.112). This, of course, is due to the fact 
that here v;’s are complex-valued. 

The performance function € of Eq. (4.112) may be thought of as a hyperparabola 
in the (N + 1)-dimensional space whose first N-axes are defined by the complex-valued 
variables v;’s and its (N + 1)th axis is the real-valued performance function £. To prevent 


Eigenanalysis and Performance Surface 111 


Figure 4.7 Performance surface of a two-tap transversal filter for different eigenvalue spread of 
R: (a) Ag/A, = 3, (b) Ag/A, = 9, and (c) Ap/A, = 39. 
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Figure 4.7 Continued. 


such a mixed domain and have a clearer picture of the performance surface in the case 
of complex signals, we may expand Eq. (4.112) further by replacing v; with v; g + JU; j, 
where v; z and v; ; are the real and imaginary parts of v;. With this, we obtain 

N-1 

E = Emin + > Ai (Ue + UE) (4.113) 

i=0 
Here, v; g and v; ; are both real-valued variables. Equation (4.113) shows that the perfor- 
mance surface of an N-tap transversal Wiener filter with complex-valued coefficients is a 
hyperparabola in the (2N + 1)-dimensional Euclidean space of the variables consisting 
of the real and imaginary parts of the filter coefficients and the performance function. 


Problems 


P4.1 Consider the performance function 
E= wi +w? + ww — wtw tl. 


(i) Convert this to its canonical form. 
(ii) Plot the set of contour ellipses of the performance surface of € for values of 
é = l, 2, 3, and 4. 
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P4.2 


P4.3 


P4.4 


P4.5 


R is a correlation matrix. 
(i) Using the unitary similarity transformation, show that for any integer n 
R” = Q A” QË. 
(ii) The matrix R!/? with the property R!R!/ = R is defined as the square 


root of R. Show that 
R! = QA!PQE. 


(iii) Show that the identity 
R! = Q Af QE 
is valid for any rational number a. 


Consider the correlation matrix R of an N-by-1 observation vector x(n), and an 
arbitrary N-by-N unitary transformation matrix U. Define the vector 


Xy (n) = Ux(n) 
and its corresponding correlation matrix Ry = E[xy (n)xt} (n)]. 


(i) Show that R and Ry share the same set of eigenvalues. 
(ii) Find an expression for the eigenvectors of Ry in terms of the eigenvectors 
of R and the transformation matrix U. 


In Example 4.3, we noted that the eigenvectors of the correlation matrix of any 
two-tap transversal filter with real-valued input are fixed and are 


Plot the magnitude responses of the eigenfilters defined by qo and q; and verify 
that qo corresponds to a low-pass filter and qı corresponds to a high-pass one. 
How do you relate this observation with the minimax theorem? 


Consider the correlation matrix R of a three-tap transversal filter with a real-valued 
input x(n). 


(i) Show that when E[x?(n)] = 1, R has the form 


1 py bo 
R=] | ø 
Pr py l 
(ii) Show that 

1 1 
0 

qo = = 
Z |i 


is an eigenvector of R and find its corresponding eigenvalue. 
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(111) Show that the other eigenvectors of R are 


1 1 
q; = &i |> 
af 2 +a? 1 
where 
mn —p2/ Pı + (eal ove +8 ‘it —p2/ 01 — V (02/01) + 8 


2 


for i=1,2 


Find the eigenvalues that correspond to q; and qz. 

(iv) For the following numerical values, plot the magnitude responses of the 
eigenfilters defined by qo, qı, and q,, and find that in all cases these corre- 
spond to a band-pass, a low-pass, and a high-pass filter, respectively. 


(a) pı = 0.5; p2 = 0.25. 
(b) o = 0.8, py = 0.3. 
(c) Py = 0.9, Pı = —0.4. 


How do you relate this observation to the minimax theorem? 
P4.6 Consider the correlation matrix R of an observation vector x(n). Define the vector 
x(n) = R7!/?x(n) 
where R~!/? is the inverse of R!/2, and R!/? is defined as in Problem P4.2. Show 
that the correlation matrix of x(n) is the identity matrix. 


P4.7 Consider the case discussed in Example 4.2, and the eigenvector qo as defined by 
Eq. (4.76). Show that any vector q;, which is orthogonal to qo (i.e., gio = 0), 
is a solution to the equation 

Rq; = 0, q;. 


P4.8 The determinant of an N-by-N matrix A can be obtained by iterating the equation 
N-1 
det(A) = 5 (—1)/ay cof; (A) 
j=0 


where a; is the ijth element of A, and cof; (A) denotes the ijth cofactor of A, 
which is defined as 


cof; (A) = (— 1)'*/det(A,;) 


where A; is the (N — 1)-by-(N — 1) matrix obtained by deleting the ith row and 
jth column of A. This procedure is general and applicable to all square matrices. 
Use this procedure to show that 


(i) Equation (4.24) is a valid result for any arbitrary square matrix A. 
(ii) For any square matrix A 


N-1 
det(A) = |] 4, 
i=0 


where à;s are the eigenvalues of A. 
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P4.9 
P4.10 


P4.11 
P4.12 
P4.13 


P4.14 


Give a proof for the minimax procedure suggested by Eqs. (4.31)-(4.33). 


Consider a filter whose input is the vector X(n), as defined in Problem P4.6, and 
its output is y(n) = w'X(n), where W is N-by-1 tap-weight vector of the filter. 
Discuss on the shape of the performance surface of this filter. 


Workout the details of the derivation of Eq. (4.70). 
Give a detailed derivation of Eq. (4.111). 
The input process to an N-tap transversal filter is 

x(n) = ae?" + azet?” +v(n) 


where a, and a, are uncorrelated complex-valued zero-mean random variables 
with variances of and ož, respectively, and {v(n)} is a white noise process with 
variance unity. 


(i) Derive an equation for the correlation matrix, R, of the observation vector 
at the filter input. 
(ii) Following an argument similar to the one in Example 4.2, show that the 
smallest N — 2 eigenvalues of R are all equal to oĉ. 
(iii) Let 
aT el e201... eg TN Dent 


JN 


Up = 


and 1 
uj = [1 eTi®2 e i202... eI N- DoT, 
VN 
Show that the eigenvectors corresponding to the largest two eigenvalues of 
R are 
Go = %o0Uo + 1 Uy 


and 
qı = &10Uọ + Oy, Uy 


where 9, Op), Œ1ọ, and a1, are a set coefficients to be found. Propose a 
minimax procedure for finding these coefficients. 
(iv) Find the coefficients a , 1, @19, and a, of Part (iii) in the case where 
ugu: = 0. Discuss on the uniqueness of the answer in the cases where o? Æ 
o; and o? = ae, 
Equation (4.113) suggests that the performance surface of an N-tap FIR Wiener 
filter with complex-valued input is equivalent to the performance surface of a 
2N-tap filter with real-valued input. Furthermore, the eigenvalues corresponding 
to the latter surface appear with multiplicity of at least two. This problem suggests 
an alternative procedure, which also leads to the same results. 


(i) Show that the Hermitian form wĦRw may be expanded as 


H T T| Rg E] b 
w Rw = [wp w 
[Wr aby Rp Ww) 


where the subscripts R and / refer to real and imaginary parts. 
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Hint: Note that RT = —R, and this implies that for any arbitrary vector v, 
v'R,v=0. 
(ii) Show that equation 
Rq; = 1,q; (P4.14.1) 


E k] [e] =}. ball 
R; Re J LG ' La; 


Also, multiplying Eq. (P4.14.1) through by j = /—1, we get R(jq;) = 
A; (jq;). Show that this implies 


E z Fg zj; Bal ; 
R; Re qi, R ' Lar 


Relate these with Eq. (4.113). 


implies 


Computer-Oriented Problems 


The following problems involve numerical evaluation/analysis of large matrices. MAT- 
LAB will work best for completing the solutions to these problems. 


P4.15 


P4.16 


P4.17 


Consider a random process x(n) = v(n) + cos(0.37n + 0), where v(n) is a white 
process with power spectral density ®,,(e/”) = o2, and @ is a random variable 
uniformly distributed in the interval 0 to 27. 


(i) Find an expression for the autocorrelation coefficients o (k). 

(ii) Using the result of (i), find an expression for the power spectral density 
®,.(e/%). 

(iii) Present the autocorrelation matrix R of x(n) = [x(n) x(n — 1)--- x(n — 
N+1)1". 

(iv) For N = 10 and ø? = 0.01, find the numerical results for the eigenvectors 
and eigenvalues of R. 

(v) Repeat (iv), for N = 10 and the choices of o? = 0 and 0.1 and compare your 
results of the three choices of o7. Explain any relationship that you may 
find and explain your findings with the theoretical results in this chapter. 

(vi) Present a plot of ®,., (e/”), in a form similar to Figure 4.3, within the 
normalized frequency range 0 to 1. 

(vii) For the numerical results evaluated in (iv) and (v), add the plots of the 
magnitude responses of the associated eigenfilters to the results of (vi). Make 
the observation that all eigenfilters, except one, have zeros at the normalized 
frequencies 0.15 and 0.85. Following the argument made in Example 4.2, 
explain this observation. 


Repeat Problem P4.15 for the case where x(n) = v(n) + e/977"* and explain 
differences that you observe in the results compared to those in Problem P4.15. 


Consider a random process x(n) = v(n) + cos(0.37n + 01) + cos(0.5mrn + 02), 
where v(n) is a white process with power spectral density ®,,,(e/@) = oĉ, and 
6; and 0, are two independent random variables both uniformly distributed in the 
interval 0 to 27. 


Eigenanalysis and Performance Surface 117 


P4.18 


(i) Find an expression for the autocorrelation coefficients (k). 

(ii) Using the result of (i), find an expression for the power spectral density 
P (e72). 

(iii) Present the autocorrelation matrix R of x(n) = [x(n) x(n — 1)--- x(n — 
N+)". 

(iv) For N = 10 and o? = 0.01, find the numerical results for the eigenvectors 
and eigenvalues of R. 

(v) Repeat (iv), for N = 10 and the choices of o? = 0 and 0.1 and compare your 
results of the three choices of oĉ. Explain any relationship that you may 
find and explain your findings with the theoretical results in this chapter. 

(vi) Present a plot of ®,., (e/”), in a form similar to Figure 4.3, within the 
normalized frequency range 0 to 1. 

(vii) For the numerical results evaluated in (iv) and (v), add the plots of the 
magnitude responses of the associated eigenfilters to the results of (vi). 
Make the observation that all eigenfilters, except two, have zeros at the 
normalized frequencies 0.15, 0.25, 0.75, and 0.85. Expand the argument 
made in Example 4.2 to explain this observation. 


Consider the case where a random process {x(n)} is generated as in Figure P4.18. 
Let x(n) = [x(n) x(n—1) --- x(n—9)]". 


(i) Find and present the correlation matrix R = E [x(n)x"#(n)]. 
(ii) Find and present the eigenvalues, 4;, and eigenvectors, q;, of R. 
(iii) Present a plot of the power spectral density ®,, (e/”). 
(iv) Add the plots of the magnitude responses of the eigenfilters of R to the 
power spectral density plot in Part (c). 
(v) Make an attempt to relate the plots in Part (d) to the eigenvalues in Part (b). 


unit variance 
white process 


unit variance 
white process 


Figure P4.20 
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P4.19 Repeat Problem P4.18 for the case where H(z) = 1 +2z7! + 277. 


P4.20 Repeat Problems P4.18 and P4.19 for the case where {x(n)} is generated as in 
Figure P4.20. Here, v(n) is a white random process with variance of o? = 0.1. 
Compare the eigenvalues and eigenvectors obtained here with those in Prob- 
lems P4.18 and P4.19. 


5 


Search Methods 


In the last two chapters, we established that the optimum tap weights of a transversal 
Wiener filter can be obtained by solving the Wiener—Hopf equation, provided the required 
statistics of the underlying signals are available. We arrived at this solution by minimizing 
a cost function, which is a quadratic function of the filter tap-weight vector. An alternative 
way of finding the optimum tap weights of a transversal filter is to use an iterative search 
algorithm that starts at some arbitrary initial point in the tap-weight vector space and 
progressively moves toward the optimum tap-weight vector in steps. Each step is chosen 
so that the underlying cost function is reduced. If the cost function is convex (which is 
so for the transversal filter problem), such an iterative search procedure is guaranteed 
to converge to the optimum solution. The principle of finding the optimum tap-weight 
vector by progressive minimization of the underlying cost function by means of an iterative 
algorithm is central to the development of adaptive algorithms, which will be extensively 
discussed in the forthcoming chapters of this book. Using a highly simplified language, we 
might state at this point that adaptive algorithms are nothing but iterative search algorithms 
derived for minimizing the underlying cost function with the true statistics replaced by 
their estimates obtained in some manner. Hence, a very thorough understanding of the 
iterative algorithms from the point of view of their development and convergence property 
is an essential prerequisite for the study of adaptive algorithms, and this is the subject of 
this chapter. 

In this chapter, we discuss two gradient-based iterative methods for searching the per- 
formance surface of a transversal Wiener filter to find the tap weights that correspond to its 
minimum point. These methods are idealized versions of the class of practical algorithms, 
which will be presented in the next few chapters. We assume that the correlation matrix of 
the input samples to the filter and the cross-correlation vector between the desired output 
and filter input are known a priori. 

The first method that we discuss is known as the method of steepest descent. The basic 
concept behind this method is simple. Assuming that the cost function to be minimized is 
convex, one may start with an arbitrary point on the performance surface and take a small 
step in the direction in which the cost function decreases fastest. This corresponds to a 
step along the steepest-descent slope of the performance surface at that point. Repeating 
this successively, convergence toward the bottom of the performance surface, at which 
point the set of parameters that minimize the cost function assume their optimum values, 
is guaranteed. For the transversal Wiener filters, we find that this method may suffer from 
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slow convergence. The second method that we introduce can overcome this problem at 
the cost of additional complexity. This, which is known as the Newton’s method, takes 
steps that are in the direction pointing toward the bottom of the performance surface. 
Our discussion in this chapter is limited to the case where the filter tap weights, input, 
and desired output are real-valued. The extension of the results to the case of complex- 
valued signals is straightforward and deferred to a problem at the end of the chapter. 


5.1 Method of Steepest Descent 


Consider a transversal Wiener filter, as in Figure 5.1. The filter input, x(n), and its 
desired output, d(n), are assumed to be real-valued sequences. The filter tap weights, 
Wo, W1, -.., Wy], are also assumed to be real-valued. The filter tap-weight and input are 
defined, respectively, by the column vectors 


w= [wọ w; = wyl” (5.1) 


and 
x(n) = [x(n) x(n — 1) --- x(n— N + iby i (5.2) 


where superscript T stands for transpose. The filter output is 
y(n) = w'x(n) (5.3) 


We recall from Chapter 3 that the optimum tap-weight vector w, is the one that mini- 
mizes the performance function 
£ = Ele*(n)] (5.4) 


Figure 5.1 A transversal filter. 
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where e(n) = d(n) — y(n) is the estimation error of the Wiener filter. Also, we recall that 
the performance function £ can be expanded as 


E = E[d?(n)] — 2w'p+w' Rw (5.5) 


where R = E[x(n)x'(n)] is the autocorrelation matrix of the filter input and 
p = E[x(n)d(n)] is the cross-correlation vector between the filter input and its desired 
output. The function € (whose details were given in the last chapter) is a quadratic 
function of the filter tap-weight vector w. It has a single global minimum, which can be 
obtained by solving the Wiener—Hopf equation 


Rw, =p (5.6) 


if R and p are available. Here, we assume that R and p are available, but resort to a 
different approach to find w,. Instead of trying to solve Eq. (5.6) directly, we choose 
an iterative search method in which starting with an initial guess for w,, say w(0), a 
recursive search method that may require many iterations (steps) to converge to w, is 
used. Understanding of this method is basic to the development of the iterative algorithms, 
which are commonly used in the implementation of adaptive filters in practice. 

The method of steepest descent is a general scheme that uses the following steps to 
search for the minimum point of any convex function of a set of parameters: 


1. Start with an initial guess of the parameters whose optimum values are to be found 
for minimizing the function. 

2. Find the gradient of the function with respect to these parameters at the present point. 

3. Update the parameters by taking a step in the opposite direction of the gradient vector 
obtained in Step 2. This corresponds to a step in the direction of steepest descent in 
the cost function at the present point. Furthermore, the size of the step taken is chosen 
proportional to the size of the gradient vector. 

4. Repeat Steps 2 and 3 until no further significant change is observed in the parameters. 


To implement this procedure in the case of the transversal filter shown in Figure 5.1, 
we recall from Chapter 3 that 
Vé = 2Rw — 2p (5.7) 


where V is the gradient operator defined as the column vector 


T 
v=[= ee | (5.8) 


dWo Ow OWy_] 


According to the above procedure, if w(k) is the tap-weight vector at the kth iteration, 
the following recursive equation may be used to update w(k). 


wk + 1) = w(k) — wV,& (5.9) 


where u is a positive scalar called step-size, and V¿Ẹ denotes the gradient vector V& 
evaluated at the point w = w(k). Substituting Eq. (5.7 ) in Eq. (5.9), we get 


wk + 1) = wk) — 2u (Rw(k) — p) (5.10) 
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As we shall soon show, the convergence of w(k) to the optimum solution w, and the 
speed at which this convergence takes place are dependent on the size of the step-size 
parameter u. A large step-size may result in divergence of this recursive equation. 

To see how the recursive update w(k) converges toward w,, we rearrange Eq. (5.10) 
as 


w(k + 1) = A — 2uR)w(k) + 2up (5.11) 


where I is the N-by-N identity matrix. Next, we substitute for p from Eq. (5.6). Also, 
we subtract w, from both sides of Eq. (5.11) and rearrange the result to obtain 


w(k + 1) — w, = (I — 2uR)(w(k) — w,) (5.12) 


Defining the vector v(k) as 
v(k) = w(k) — W, (5.13) 


and substituting this in Eq. (5.12), we obtain 
v(k + 1) = (I — 2uR)v(k) (5.14) 


This is the tap-weight update equation in terms of the v-axes (see Chapter 4 for further 
discussion on the v-axes). This result can be simplified further if we transform these to the 
v’-axes (see Eq. (4.92) of Chapter 4 for the definition of v’-axes). Recall from Chapter 4 
that R has the following unitary similarity decomposition 


R = QAQ? (5.15) 


where A is a diagonal matrix consisting of the eigenvalues Ao, A,,..., Ay_, Of R and the 
columns of Q contain the corresponding orthonormal eigenvectors. Substituting Eq. (5.15) 
in Eq. (5.14) and replacing I with QQ", we get 


v(k + 1) = (QQ = 2nQAQ*)v(K) 
= QA = 2uA)Q'v(k) (5.16) 
Premultiplying Eq. (5.16) by QT and recalling the transformation 
v (k) = Q'v(k) (5.17) 
we obtain the recursive equation in terms of v’-axes as 
v(k +1) = I — 2wA)v'(k) (5.18) 
The vector recursive Eq. (5.18) may be separated into the scalar recursive equations 
vi(k +1) = (l — 2uà;)vi (k), fori =0,1,...,N—1 (5.19) 


where v;(k) is the ith element of the vector v’ (k). 
Starting with a set of initial values vj (0), vj (0), ..., vy_,(O) and iterating Eq. (5.19) 
k times, we get 


ui(k) = (1 — 2uà;)v (0), fori =0,1,...,N—1 (5.20) 
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From Eqs. (5.13) and (5.17), we see that w(k) converges to w, if and only if v'(k) 
converges to the zero vector. But Eq. (5.20) implies that v'(k) can converge to zero if 
and only if the step-size parameter jz is selected so that 


|1—2ya,|<1, fori=0,1,...,.N—1 (5.21) 


When Eq. (5.21) is satisfied, the scalars v; (k), for i = 0, 1,..., N — 1, exponentially 
decay toward zero as the number of iterations, k, increases. Furthermore, Eq. (5.21) 
provides the condition for the recursive equations (5.20) and, hence, the steepest-descent 
algorithm to be stable. The inequalities (5.21) may be expanded as 


=] < 1= 2u <1 


or 


1 
0<u< o for i = 0, 1,...,N— 1 (5.22) 


l 


Noting that the step-size parameter u is common for all values of i, convergence (stability) 
of the steepest-descent algorithm is guaranteed only when 


O<u< (5.23) 


max 


where max is the maximum of the eigenvalues Ag, àj, ..., Ay_,. The left limit in 
Eq. (5.23) refers to the fact that the tap-weight correction must be in the opposite direction 
of the gradient vector. The right limit is to ensure that all the scalar tap-weight parameters 
in the recursive equations (5.19) decay exponentially as k increases. 

Figure 5.2 depicts a set of plots that show how a particular tap-weight parameter v‘ (k) 
varies as a function of the iteration index k and for different values of the step-size 
parameter u. The cases considered here correspond to the typical distinct ranges of m, 


referred to as overdamped (0 guz x). underdamped ($ <u< +), and unstable 
(u <Ooru> +). 

We may now derive a more explicit formulation for the transient behavior of the 
steepest-descent algorithm in terms of the original tap-weight vector w(k). We note that 


wk) = w, + v(k) 


= W, + Qv'(k) 
v9 (k) 
vi (k) 
= Wo + [qo qı ++ qy-ı] : 
vyk) 
N-1 
=w, +} qu% (5.24) 


i=0 
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Figure 5.2 Convergence of v;(k) as a function of iteration index k, for different values of the 
step-size parameter jz. (a) Overdamped case: 0 < u < 1/2A;, (b) Underdamped case: 1/2A; < 
u < 1/d,;, (c) Unstable: u < 0, and (d) Unstable: m > 1/A,. 


where qo; qi, ---, Qy—; are the eigenvectors associated with the eigenvalues Ag, Ay, ..., 
Aw_, Of the correlation matrix R. Substituting Eq. (5.20) in Eq. (5.24), we obtain 


N-1 


wk) = wo + >) VOC — 2n,)*q; (5.25) 
i=0 


This result shows that the transient behavior of the steepest-descent algorithm for an 
N-tap transversal filter is determined by a sum of N exponential terms, each of which 
is controlled by one of the eigenvalues of the correlation matrix R. Each eigenvalue A, 
determines a particular mode of convergence in the direction defined by its associated 
eigenvector q;. The various modes work independent of one another. For a selected value 
of the step-size parameter u, the geometrical ratio factor 1 — 2uà;, which determines 
how fast the ith mode converges, is determined by the value of 1,. 
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model 


Figure 5.3 A modeling problem. 


Example 5.1 


Consider the modeling problem depicted in Figure 5.3. The input signal, x(n), is generated 
by passing a white noise signal, v(m), through a coloring filter with the system function 


V1—a2 


A(z) = = 


(5.26) 
] — œz 


where a is a real-valued constant in the range of —1 to +1. The plant is a two-tap FIR 
system with the system function 


P(z)=1—4z7! 
An adaptive filter with the system function 
W(z) = wọ + wiz! 


is used to identify the plant system function. The steepest-descent algorithm is used to 
find the optimum values of the tap weights wọ and w,. We want to see, as the iteration 
number increases, how the tap weights wọ and w; converge toward the plant coefficients 
1 and —4, respectively. We examine this for different values of the parameter a. 

From the results derived in Example 4.1 of Chapter 4, we note that 


E[x?(n)] = 1 and E[x(n)x(n — 1)] =a 
These give 


R = E[x(n)x"(n)] = f | (5.27) 


a 


where x(n) = [x(n) x(n — 1)]". Furthermore, the elements of the cross-correlation vector 
p = E[x(n)d(n)] are obtained as follows: 


Po = Elx(n)d(n)] = Elx(n) (x(n) — 4x(n — 1))] 
= E[x?(n)] — 4E[x(n)x(n — 1)] = 1 — 4a 
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and 

pı = E[x(n — 1)d(n)] = E[x(n — 1)(x (n) — 4x(n — 1))] 

= E[x(n — 1)x(n)] — EX? — 1)] =a — 4 
These give 
p= ka (5.28) 

Substituting Eqs. (5.27) and (5.28) in Eq. (5.11), we get 

wọ(k+1)|_|l1—2u —2ua || wok) 1 — 4g 

E- + | ~ ee 1- A E] Tan | a Al (5.29) 


Starting with an initial value w(0) = [wọ(0) w; (0)]" and letting the recursive equation 
(5.29) to run, we get two sequences of the tap-weight variables wọ(k) and w),(k). We 
may then plot w,(k) versus wo(k) to get the trajectory (path) that the steepest-descent 
algorithm follows. Figure 5.4a, b, c, and d show four of such trajectories that we have 
obtained for values of a = 0, 0.5, 0.75, and 0.9, respectively. Also shown in the figures are 
the contour plots, which highlight the performance surface of the filter. The convergence 
of the algorithm along the steepest-descent slope of the performance surface can be 
clearly seen. The results presented are for u = 0.05 and 30 iterations, for all cases. It is 
interesting to note that in the case œ = 0, which corresponds to a white input sequence, 
x(n), the convergence is almost complete within 30 iterations. However, the other three 
cases require some more iterations before they converge to the minimum point of the 
performance surface. This can be understood if one notes that the eigenvalues of R are 
Ag = 1+qa and A, = 1 —a, and for œ close to 1, the geometrical ratio factor 1 — 2A, 
may be very close to 1. This introduces a slow mode of convergence along vj -axis (i.e., 
in the direction defined by the eigenvector q,). 


5.2 Learning Curve 


Although the recursive equations (5.19) and (5.24) provide detailed information about 
the transient behavior of the steepest-descent algorithm, the multiparameter nature of the 
equations makes it difficult to visualize such behavior graphically. Instead, it is more con- 
venient to consider the variation of the mean squared error (MSE), that is, the performance 
function £, versus the number of iterations. 

We define (k) as the value of the performance function £ when w = w(k). Then, using 
Eq. (4.94) of Chapter 4, we get 


N-1 
EK) = Emin + DO AvP) (5.30) 
i=0 
where &,,;, is the minimum MSE. Substituting Eq. (5.20) in Eq. (5.30), we obtain 
N-1 
&(k) = Emnin F y àd = 2uà;)™ v? (0) (5.31) 


i=0 
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Figure 5.4 Trajectories showing how the filter tap weights vary when the steepest-descent algo- 
rithm is used: (a) a = 0, (b) a = 0.5, (c) a = 0.75, and (d) a = 0.9. Each plot is based on 30 
iterations and u = 0.05. 


When u is selected within the bounds defined by Eq. (5.23), the terms under the sum- 
mation in Eq. (5.31) converge to zero as k increases. As a result, the minimum MSE is 
achieved after a sufficient number of iterations. 

The curve obtained by plotting (k) as a function of the iteration index, k, is called 
learning curve. A learning curve of the steepest-descent algorithm, as can be seen from 
Eq. (5.31), consists of a sum of N exponentially decaying terms, each of which corre- 
sponds to one of the modes of convergence of the algorithm. Each exponential term may 
be characterized by a time constant, which is obtained as follows. 

Let 


(1 — 2ua,)* = e*/ (5.32) 
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and define t; as the time constant associated with the exponential term (1 — 2uà;)™. 
Solving Eq. (5.32) for t;, we get 


—1 
Tj © Zin — 2uAy) (5.33) 
For small values of the step-size parameter u, when 2A; < 1, we note that 
ln(1 — 2uà;) © —2d,; (5.34) 
Substituting this in Eq. (5.33), we obtain 
eee (5.35) 
"Aud, 
This result, which is true for all values of i = 0, 1, ..., N — 1, shows that, in general, the 


number of time constants that characterize a learning curve are equal to the number of 
filter taps. Furthermore, the time constants that are associated with the smaller eigenvalues 
are larger than those associated with the larger eigenvalues. 


Example 5.2 


Consider the modeling arrangement that was discussed in Example 5.1. The correlation 
matrix R of the filter input is given by Eq. (5.27). The eigenvalues of R are 


Using these in Eq. (5.35), we obtain 


1 


ET oe 


To 


and 
1 


These are the time constants that characterize the learning curve of the modeling problem. 
Figure 5.5 shows a learning curve of the modeling problem when w(0) = [2 2]", a = 0.75, 
and u = 0.05. For these values, we obtain 


t) 2.85 and 1,20 (5.38) 


The existence of two distinct time constants on the learning curve in Figure 5.5 is clearly 
observed. 

The two time constants could be observed more clearly if the £ axis is scaled logarith- 
mically. To see this, the learning curve of the modeling problem is plotted in Figure 5.6 
with the € axis scaled logarithmically. The two exponentials appear as two straight lines 
on this plot. The first part of the plot, with a steep slope, is dominantly controlled by 
To. The remaining part of the learning curve shows the contribution of the second expo- 
nential, which is characterized by t,. Estimates of the time constants may be obtained 
by finding the number of iterations required for € to drop 2.73 (i.e., the Napier number) 
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Figure 5.5 A learning curve of the modeling problem. The € (MSE) axis is scaled linearly. 


0 20 40 60 80 100 


Figure 5.6 A learning curve of the modeling problem. The € (MSE) axis is scaled logarithmically. 
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times along each of the slopes. This gives 
Ty © 3 and q © 20 


which match well with those in Eq. (5.38). 


5.3 Effect of Eigenvalue Spread 


Our study in the last two sections shows that the performance of the steepest-descent 
algorithm is highly dependent on the eigenvalues of the correlation matrix R. In general, 
a wider spread of the eigenvalues results in a poorer performance of the steepest-descent 
algorithm. To gain further insight into this property of the steepest-descent algorithm, we 
find the optimum value of the step-size parameter u, which results in the fastest possible 
convergence of the steepest-descent algorithm. 

We note that the speeds at which various modes of the steepest-descent algorithm 
converge are determined by the size (absolute value) of the geometrical ratio factors 
1 — 2uà;, for i =0, 1, ..., N —1. For a given value of m, the transient time of the 
steepest-descent algorithm is determined by the largest element in the set {|1 — 2,|, i = 
0,1,...,M—1}. The optimum value of u, which minimizes the largest element in the 
latter set, is obtained by looking at the two extreme cases that correspond to Amas and 
Amin> that is, the maximum and minimum eigenvalues of R. Figure 5.7 shows the plots of 
|1 — 2uAmin| and |1 — 2u max| as functions of u. The plots for the other eigenvalues lie 
in between these two plots. From these plots, one can clearly see that the optimum value 
of the step-size parameter u corresponds to the point where the two plots meet. This is 
the point highlighted as Mop in Figure 5.7. It corresponds to the case where 


L= 2UloptÀmin =-(1- 2 MoptAmax) (5.39) 
Solving this for Mopp we obtain 


1 

=z 5.40 
m Amin + max i 
For this choice of the step-size parameter 1 — 2MoptÀmin 18 positive and 1 — 2MoptAmax 
is negative. These correspond to overdamped and underdamped cases presented in 
Figure 5.2a and b, respectively. However, the two modes converge at the same speed. 
For u = Hop the speed of convergence of the steepest-descent algorithm is determined 

by the geometrical ratio factor 


B=1- 2 Mopt*>min (5.41) 


Substituting Eq. (5.40) in Eq. (5.41), we obtain 


Amax __ 1 


(= (5.42) 


Amax 

Amin 
This has a value that remains between 0 and 1. When À max = Amin» 6 = 0 and the steepest- 
descent algorithm can converge in one step. As the ratio Àmax/Amin Increases, B also 


increases and becomes close to 1 when Àmax/Amin 1S large. Clearly, a value of 6 close 
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0 0.2 0.4 Lot 0.6 0.8 
u 


Figure 5.7 The extreme cases showing how | — 2A, varies as a function of the step-size 
parameter m. 


to 1 corresponds to a slow mode of convergence. Thus, we note that the ratio À max/Àmin 
plays a fundamental role in limiting the convergence performance of the steepest-descent 
algorithm. This ratio is called eigenvalue spread. 

We may also recall from the last chapter that the values of A,,,, and Amin are closely 
related to the maximum and minimum values of the power spectral density of the underly- 
ing process. Noting this, we may say that the performance of the steepest-descent algorithm 
is closely related to the shape of the power spectral density of the underlying input process. 
A wide distribution of the energy of the underlying process within different frequency 
bands introduces slow modes of convergence, which result in a poor performance of the 
steepest-descent algorithm. When the underlying process contains very little energy in a 
band of frequencies, we say the filter is weakly excited in that band. Weak excitation, as 
we see, degrades the performance of the steepest-descent algorithm. 


5.4 Newton’s Method 


Our discussions in the last few sections show that the steepest-descent algorithm may 
suffer from slow modes of convergence, which arise due to the spread in the eigenvalues 
of the correlation matrix R. This means that if we can somehow get rid of the eigenvalue 
spread, we can get much better convergence performance. This is exactly what Newton’s 
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method does. To derive Newton’s method for the quadratic case, we start from the steepest- 
descent algorithm given in Eq. (5.10). Using p = Rw,, Eq. (5.10) becomes 


wk + 1) = wk) — 2uR(w(k) — w,) (5.43) 


We may note that it is the presence of R in Eq. (5.43), which causes the eigenvalue spread 
problem in the steepest-descent algorithm. Newton’s method overcomes this problem by 
replacing the scalar step-size parameter jz with a matrix step-size given by wR~!. The 
resulting algorithm is 

w(k + 1) = wk) — uR'V,é (5.44) 


Figure 5.8 demonstrates the effect of the addition of RT! in front of the gradient vector 
in Newton’s update Eq. (5.44). This has the effect of rotating the gradient vector to the 
direction pointing toward the minimum point of the performance surface. 

Substituting Eq. (5.7) in Eq. (5.44), we obtain 


w(k + 1) = wk) — 2uR7!(Rw(k) — p) 
= (1 — 2u)w(k) + 2uR7'p (5.45) 


We also note that R~'p is equal to the optimum tap-weight vector w,. Using this in 
Eq. (5.45), we obtain 
wk + 1) = (1 — 2u)w(k) + 2uw, (5.46) 


Figure 5.8 The negative gradient vector and its correction by Newton’s method. 
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Subtracting w, from both sides of Eq. (5.46), we get 
w(k + 1) — w, = (1 — 2u) (w(k) — w,) (5.47) 
Starting with an initial value w(0) and iterating Eq. (5.47), we obtain 
w(k) — wy = (1 — 24)*(w(O) — wo) (5.48) 


The original Newton’s method selects the step-size parameter jz equal to 0.5. This 
leads to convergence of w(k) to its optimum value, w,, in one iteration. In particular, 
we note that setting u = 0.5 and k = 1 in Eq. (5.48), we obtain w(1) = w,. However, 
in actual implementation of adaptive filters where the exact values of V,é and R7! are 
not available and they have to be estimated, one needs to use a step-size parameter much 
smaller than 0.5. Thus, an evaluation of Newton’s recursion (5.44) for values of u Æ 0.5 
is instructive for our further study in the later chapters. 

Using Eq. (5.48) and following the same line of derivations as in the case of the 
steepest-descent method, it is straightforward to show that (Problem P5.5) 


E(k) = Emin + (1 — 2u)™E (0) (5.49) 


where &(k) is the value of the performance function, £, when w = w(k). 
From Eq. (5.49), we note that the stability of Newton’s algorithm is guaranteed when 
|1 — 2u| < 1 or, equivalently, 
O<u<il (5.50) 


With reference to Eq. (5.49), we make the following observations: The transient behav- 
ior of Newton’s algorithm is characterized by a single exponential whose corresponding 
time constant is obtained by solving the equation 


(1 — 2p)* = e*/™, (5.51) 


When 2u < 1, this gives 
1 
x — 5:52 
der (5.52) 
This result shows that Newton’s method has only one mode of convergence and that is 
solely determined by its step-size parameter p. 


5.5 An Alternative Interpretation of Newton’s Algorithm 


Further insight into the operation of Newton’s algorithm is developed by giving an alterna- 
tive derivation of that. This derivation uses the Karhunen-Loéve transform (KLT), which 
was introduced in the last chapter. 
For an observation vector x(n) with real-valued elements, the KLT is defined by the 
equation 
x(n) = Q'x(n) (5.53) 


where Q is the N-by-N matrix whose columns are the eigenvectors qo, q1» ---, Gy_; Of 
the correlation matrix R = E[x(n)x'(n)]. We recall from Chapter 4 that the elements of 
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the transformed vector x’(n), denoted by x4(1), xi), ..., Xy_,(”) constitute a set of 
mutually uncorrelated random variables. Furthermore, Eq. (4.68) implies that 


E[x?(n)]=A,;, fori =0,1,...,N—1 (5.54) 


where A;s are the eigenvalues of the correlation matrix R. 
We define the vector x’/"(n) whose elements are 


xn) =A Px), fori =0,1,...,N—1 (5.55) 


where the superscript n signifies the fact that x(n) is normalized to the power of unity 
(see Eq. (5.57), below). These equations may collectively be written as 


x(n) = APX (n) (5.56) 
where A is a diagonal matrix consisting of the eigenvalues Ap, A), ..., Ay_}. It is straight- 
forward to show that 

R” = E[x"(n)x™ (n)] =I (5.57) 


where I is the N-by-N identity matrix. 
We also define 
w” = A1 QTw (5.58) 


and note that 
wx (n) = wTQA 2A! 2QTx(n) = w x(n) (5.59) 


This result shows that a filter with an input vector x(n) and output y(n) = w'x(n) may 
alternatively be realized by x/"(n) and w’" as the filter input and tap-weight vectors, respec- 
tively. The steepest-descent algorithm for this realization may be written as Eq. (5.11) 


w(k +1) = (A — 2uR™)w™ (k) + 2up™ (5.60) 


where 
p” = E[x’(n)d(n)]. (5.61) 


As R” =I, Eq. (5.60) simplifies to 


w'"(k + 1) = (1 — 2u)w™ (k) + 2uw' (5.62) 
where w% = (R")'p™ =p’ is the optimum value of the tap-weight vector w’". Com- 
paring this with Newton’s algorithm (5.46), we find that the steepest-descent algorithm 
in this case works just similar to Newton’s algorithm. 

Next, we show that the recursive equation (5.60) is nothing but Newton’s recursive 
equation (5.44) written in a slightly different form. For this, we use Eq. (5.58) in Eq. (5.62) 
to obtain 

A!?Q wk + 1) = (1 — 2n)Al?Q wk) + 2u A!?Q Tw, (5.63) 


Premultiplying both sides of this equation by (A!/?Q™)-! = QA~!/? (as (Q™)~! = Q), 
we get Eq. (5.46), which can easily be converted to Eq. (5.44). 
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The above development shows that Newton’s algorithm may be viewed as a steepest- 
descent algorithm for the transformed input signal. The eigenvalue spread problem asso- 
ciated with the steepest-descent algorithm is resolved by decorrelating the filter input 
samples (through their corresponding KLT) followed by a power-normalization procedure. 
This is a whitening process, viz., the input samples are decorrelated and then normalized 
to the unit power before the filtering process. 


Problems 


P5.1 


P5.2 


P5.3 


P5.4 


P5.5 
P5.6 


P5.7 


P5.8 


Use the method of steepest descent to solve the equation Rw = p for the following 
choices of R and p. For each case, find the range of the step-size parameter u for 
which the steepest-descent algorithm is convergent. Also, for each case, find the 
value of u that results in the fastest convergence of the steepest-descent algorithm. 
You may write a MATLAB code for finding the solutions. 


a 21 1 
Ed -i 
(ii) 
2 105 1 
R=/121], pe=|-l 
0.5 1 2 0 


By applying the method of steepest descent to the canonical form of the perfor- 
mance function, that is, Eq. (4.93), suggest an alternative derivation of Eq. (5.25). 


Show that when the steepest-descent algorithm (5.10) is used, the time constants 
that control the variation of the tap weights of a transversal filter are 
A 1 
T= i 
2uÀ; 


fori = 0, 1,...,N— 1 

Give a detailed derivation of Eq. (5.25) in the case where underlying signals are 
complex-valued. 

Give a detailed derivation of Eq. (5.49). 


Show that if in the steepest-descent algorithm, the tap-weight vector is initialized 
to zero, 
w(k) = [I — (I — 2uR)*]w, 


where w, is the optimum tap-weight vector. 


R is a correlation matrix with the eigenvalues A;, i = 0, 1,..., N — 1. Find the 
eigenvalues of the matrix G = I + R + R°. 


R is a correlation matrix with the eigenvalues A;, i = 0, 1,..., N — 1. Prove that 
if 0 <A; <1, fori =0,1,...,N =l 
(i) 
lim I+R+R?+---+R") = (A-R! 
n>=00 
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P5.9 


P5.10 
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(ii) 
lim (+ @-R)+ A-R} +--+ (A-R) =R! 


Consider the modeling problem depicted in Figure P5.6. Note that the input to 
the model is a noisy version of the plant input. The additive noise at the model 
input, v;(n), is white and its variance is a. The sequence v,(7) is the plant noise. 
It is uncorrelated with u(n) and v;(n). The correlation matrix of the plant input, 
u(n), is denoted by R. The model has to be selected so that the MSE at the model 
output is minimized. 


(i) Find the correlation matrix of the model input and show that it shares the 
same set of eigenvectors with R. 
(ii) Derive the corresponding Wiener—Hopf equation. 
(iii) Show that the difference between the plant tap-weight vector, w,, and its 
estimate, W,, which is obtained through the Wiener—Hopf equation derived 
in (ii), is 


A o? 2 
— Wo 
wa a of 


where q;s are the eigenvectors of R and p is the cross-correlation between 
the model input and the desired output. 

(iv) Show that 

(qr p)? 


MMSE = o +0; ‘y we 


(v 


ez 


If the steepest-descent algorithm is used to find W,, find the time constants 
of the resulting learning curve. How do these time constants vary with at 


Discuss on the eigenvalue spread problem as o? varies. 


Consider a transversal filter with the input and tap-weight vectors x(n) and w, 
respectively, and output 
y(n) = w'x(n) 


Define the vector 
x(n) = R7!/?x(n) 


where R = E[x(n)x"(n)]. Let X(n) be the input to a filter whose output is obtained 
through the equation 


¥(n) = W'x(n) 
where w is the filter tap-weight vector. 


(i) Derive an equation for W so that the two outputs y(n) and y(n) be the same. 
(ii) Derive a steepest-descent update equation for the tap-weight vector w. 
(iii) Derive an equation that demonstrates the variation of the tap weights of the 
filter as the steepest-descent algorithm derived in Part (ii) is running. 
(iv) Find the time constants of the learning curve of the algorithm. 
(v) Show that the update equation derived in (ii) is equivalent to Newton’s 
algorithm. 
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P5.11 


P5.12 


P5.13 


model 


Figure P5.10 


Consider a two-tap Wiener filter, which is characterized by the following 


parameters 
1 08 2 
a he 1 | au pa H 


where R is the correlation matrix of the filter tap-input vector, x(n), and p is the 
cross-correlation between x(n) and the desired output, d(n). 


(i) Find the range of the step-size parameter u, which ensures convergence of the 
steepest-descent algorithm. Does this result depend on the cross-correlation 
vector p? 

(ii) Run the steepest-descent algorithm for u = 0.05, 0.1, 0.5, and 1 and plot the 
corresponding trajectories in the (wo, w,)-plane. 

(iii) For u = 0.05, plot wọ(k) and w,(k), separately, as functions of the iteration 
index, k. 

(iv) On the plots obtained in (iii), you should find that the variation of each 
tap weight is signified by two distinct time constants. This implies that the 
variation of each tap weight may be decomposed into a summation of two 
distinct exponential series. Explain this observation. 


For the modeling problem discussed in Examples 5.1 and 5.2, develop a MATLAB 
program to present the contour plots of the performance surface and plot the 
trajectories of the steepest-descent and Newton’s algorithm on the same plane for 
u = 0.05 and a = 0, 0.5, 0.75, and 0.9. Comment on your observations. 

Hint: To generate the contour plots, follow the procedure discussed in Section 4.3. 


Consider the modeling problem depicted in Figure 5.3. Let x(n) = 1, for all 
values of n. 


(i) Derive the steepest-descent algorithm that may be used to find the model 
parameters. 

(ii) Derive an equation for the performance function of the present problem, and 
plot the contours that show its performance surface. 
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(iii) Run the algorithm that you have derived in (i) and find the model parameters 
that it converges to. 

(iv) On the performance surface obtained in (ii), plot the trajectory showing the 
variation of the model parameters. Comment on your observation. 


P5.14 Repeat Problem P5.13 for the case where x(n) = (—1)”. 


P5.15 All the derivations in this chapter were for the case that all the underlying pro- 
cesses were real-valued. In this problem, you are guided to repeat some of the 
results for the case where the underlying processes are complex-valued. 


(i) Starting with the definition € = E[|e(n)|?] = E[e(n)e*(n)], show that in the 
case where the underlying processes are complex-valued 


£ = E[|d(n)|?] — w'p — p"w + w' Rw. 


Here, the definitions for w, p, and R follow those in Section 3.5. 
(ii) Show that 
VSE = (Rw — p). 


(iii) Using the result of (ii), present a steepest-descent update equation for the 
case of this problem and compare it with Eq. (5.10). 

(iv) Continuing with the result in (iii), can we say Eqs. (5.18), (5.23), and (5.25) 
are also valid in the case where the underlying processes are complex-valued? 
Why? Explain. 


6 


LMS Algorithm 


The celebrated least-mean square (LMS) algorithm is introduced in this chapter. The LMS 
algorithm, which was first proposed by Widrow and Hoff in 1960, is the most widely used 
adaptive filtering algorithm, in practice. This wide spectrum of applications of the LMS 
algorithm can be attributed to its simplicity and robustness to signal statistics. The LMS 
algorithm has also been cited and worked upon by many researchers and over the years 
many modifications to that have been proposed. In this and the subsequent few chapters, 
we introduce and study several of such modifications. 


6.1 Derivation of LMS Algorithm 


Figure 6.1 depicts an N-tap transversal adaptive filter. The filter input, x(n), desired 
output, d(n), and the filter output 


N-1 
y(n) = > wi(n)x(n — i) (6.1) 
i=0 
are assumed to be real-valued sequences. The tap weights wo(n), wi (n), ..., wy- (7) 
are selected, so that the difference (error) 
e(n) = d(n) — y(n) (6.2) 


is minimized in some sense. It may be noted that the filter tap weights are explicitly 
indicated to be functions of the time index n. This signifies the fact that in an adaptive 
filter, in general, tap weights are time varying, as they are continuously being adapted, so 
that any variations in the signals statistics could be tracked. The LMS algorithm changes 
(adapts) the filter tap weights, so that e(n) is minimized in the mean-square sense, thus 
the name LMS. When the processes x(n) and d(n) are jointly stationary, this algorithm 
converges to a set of tap weights, which, on average, are equal to the Wiener—Hopf 
solution discussed in Chapter 3. In other words, the LMS algorithm is a practical scheme 
for realizing Wiener filters, without explicitly solving the Wiener—Hopf equation. It is a 
sequential algorithm that can be used to adapt the tap weights of a filter by continuous 
observation of its input, x(n), and desired output, d(n). 

The conventional LMS algorithm is a stochastic implementation of the steepest-descent 
algorithm. It simply replaces the cost function £ = E[e?(n)] by its instantaneous coarse 
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Figure 6.1 An N-tap transversal adaptive filter. 


estimate 3 (n) = e(n). Substituting E(n) = e*(n) for £ in the steepest-descent recursion 
(5.9), of Chapter 5, and replacing the iteration index k by the time index n, we obtain 


w(n + 1) = w(n) — uV e° (n) (6.3) 


where w(n) = [wo(n) w,(n)--+ w weil’, is the algorithm step-size parameter and V 
is the gradient operator defined as the column vector 


a ə ay 
V= Jo (6.4) 
dwg OW, OWy_] 
We note that the ith element of the gradient vector Ve?(n) is 
de? a 
eM gy (6.5) 
Ow; Ow; 


Substituting Eq. (6.2) in the last factor on the right-hand side of Eq. (6.5) and noting that 
d(n) is independent of w;, we obtain 


de? a 
of) = (6.6) 
Ow; Ow; 
Substituting for y(n) from Eq. (6.1), we get 
2 
ow = —2e(n)x(n — i) (6.7) 
Ow; 


Using Eqs. (6.4) and (6.7), we obtain 
Ve?(n) = —2e(n)x(n) (6.8) 
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Table 6.1 Summary of the LMS algorithm. 


Input: Tap-weight vector, w(n), 
Input vector, x(n), 
and desired output, d(n) 
Output: Filter output, y(n), 
Tap-weight vector update, w(n + 1) 


1. Filtering: 
y(n) = w' (n)x(n) 


2. Error estimation: 
e(n) = d(n) — y(n) 
3. Tap-weight vector adaptation: 


w(n + 1) = w(n) + 2pe(n)x(n) 


where x(n) = [x(n) x(n — 1)---x(n —N +1)]'. Substituting this result in Eq. (6.3), 
we get 
w(n + 1) = w(n) + 2ue(n)x(n) (6.9) 


This is referred to as the LMS recursion. It suggests a simple procedure for recursive 
adaptation of the filter coefficients after arrival of every new input sample, x(n), and 
its corresponding desired output sample, d(n). Equations (6.1), (6.2), and (6.9), in this 
order, specify the three steps required to complete each iteration of the LMS algorithm. 
Equation (6.1) is referred to as filtering. It is performed to obtain the filter output. 
Equation (6.2) is used to calculate the estimation error. Equation (6.9) is tap-weight 
adaptation recursion. Table 6.1 gives a summary of the LMS algorithm. 

The eminent feature of the LMS algorithm, which has made it the most popular adaptive 
filtering scheme, is its simplicity. Its implementation requires, 2N + 1 multiplications (NV 
multiplications for calculating the output y(n), one to obtain (2) x e(n) and N for scalar 
by vector multiplication (2jze(n)) x x(n)) and 2N additions. Another important feature 
of the LMS algorithm, which is equally important from implementation point of view, 
is its stable and robust performance against different signal conditions. This aspect of 
the LMS algorithm will be studied in the later chapters when it is compared with other 
alternative adaptive filtering algorithms. The major problem of the LMS recursion (6.9) 
is its slow convergence when the underlying input process is highly colored. This aspect 
of the LMS algorithm is discussed in the next section and solutions to that will be given 
in the later chapters. 


6.2 Average Tap-Weight Behavior of the LMS Algorithm 


Consider the case where the filter input, x(n), and its desired output, d(n), are stationary. 
In that case, the optimum tap-weight vector, w,, of the transversal Wiener filter is fixed 
and can be obtained according to the Wiener—Hopf equation (3.24). Subtracting w, from 
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both sides of Eq. (6.9), we obtain 
v(n + 1) = v(n) + 2pe(n)x(n) (6.10) 
where v(n) = w(n) — W, is the weight-error vector. We also note that 
e(n) = d(n) — w'(n)x(n) 
= d(n) — x"(n)w(n) 
= d(n) — x" (n)w, — x"(n)(w(n) — Wo) 
= e,(n) — x" (n)v(n) (6.11) 


where 
e,(n) = d(n) — x" (n)w, (6.12) 


is the estimation error when the filter tap weights are optimum. Substituting Eq. (6.11) in 
Eq. (6.10) and rearranging, we obtain 


v(n + 1) = @ — 2yx(n)x" (n))v(n) + 2ue,(n)x(n) (6.13) 
where I is the identity matrix. Taking expectation on both sides of Eq. (6.13), we get 


E[v(n + 1)] = EIA — 2yx(n)x" (n))v(n)] + 2u Ele (n)x(n)] 
= EJA — 2ux(n)x'(n))v(n)] (6.14) 


where the last equality follows from the fact that E[e,(n)x(n)] = 0, according to the 
principle of orthogonality. 

The main difficulty with any further analysis of the right-hand side of Eq. (6.14) is 
that it involves evaluation of the third- order moment vector E[x(n)x'(n)v(n)], which, 
in general, is a difficult mathematical task. Different approaches have been adopted by 
researchers to overcome this mathematical hurdle. The most widely used analysis assumes 
that the present observation data samples (x(n), d(n)) are independent of the past obser- 
vations (x(n — 1), d(n — 1)), (x(n — 2), d(n — 2)), ... — see, for example, Widrow et al. 
(1976) and Feuer and Weinstein (1985). This is referred to as the independence assump- 
tion. Using the independence assumption, one can argue that as v(m) depends only on 
the past observations (x(n — 1), d(n — 1)), (x(n — 2), d(n — 2)), ..., it is independent of 
x(n), and thus 

E[x()x" (n)v(n)] = Elx(n)x' (0) JElv(n)] (6.15) 


We may note that in most of the practical cases, the independence assumption is ques- 
tionable. For example, in the case of a length N transversal filter, the input vectors 


x(n) = [x(n)x(n — 1)---x(n— N + pit 


and 


x(n — 1) = [x(n — 1)x(n —2)--- x(n — N)]" 


have (N — 1) terms in common, out of N. Nevertheless, experience with the LMS algo- 
rithm has shown that the predictions made by the independence assumption match the 
computer simulations and the actual performance of the LMS algorithm, in practice. 
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This may be explained as follows. The tap-weight vector w(n) at any given time has been 
affected by the whole past history of the observation data samples (x(n — 1), d(n — 1)), 
(x(n — 2), d(n — 2)), .... When the step-size parameter u is small, the share of the last N 
observations in the present value of w(7) is small, and thus we may say x(n) and w(n) are 
weakly dependent. This clearly leads to Eq. (6.15), with some degree of approximation, if 
we can assume that the observation samples, which are apart from each other at a distance 
of N or greater, are weakly dependent. This reasoning seems to be more appealing than 
the independence assumption. In any case, we use Eq. (6.15) and other similar equations 
(approximations), which will be introduced later to proceed with our analysis in this book. 
Substituting Eq. (6.15) in Eq. (6.14), we obtain 


E[y(n + 1)] = A — 2uR)E[v(n)] (6.16) 


where R = E[x(n)x"(n)] is the correlation matrix of the input vector x(n). 

Comparing the recursions (6.16) and (5.14), we find that they are of exactly the 
same mathematical form. The deterministic weight-error vector v(k) in Eq. (5.14) 
of the steepest-descent algorithm is replaced by the averaged weight-error vector E[v(n)] 
of the LMS algorithm. This suggests that, on average, the LMS algorithm behaves just 
like the steepest-descent algorithm. In particular, similar to the steepest-descent algorithm, 
the LMS algorithm is controlled by N modes of convergence, which are characterized 
by the eigenvalues of the correlation matrix R. Consequently, the convergence behavior 
of the LMS algorithm is directly linked to the eigenvalue spread of the correlation 
matrix R. Furthermore, recalling the relationship between the eigenvalue spread of R 
and the power spectrum of x(n), we can say that the convergence of the LMS algorithm 
is directly related to the flatness in the spectral content of the underlying input process. 

Following a similar procedure as in Chapter 5, by manipulating Eq. (6.16), one can 
show that E[v(7)] converges to zero when jz remains within the range 


O<u< (6.17) 


À 


where Àmax Is the maximum eigenvalue of R. However, we should point out here that 
the above range does not necessarily guarantee the stability of the LMS algorithm. The 
convergence of the LMS algorithm requires convergence of the mean of w(n) toward W, 
and also convergence of the variance of elements of w(n) to some limited values. As we 
shall show later, to guarantee the stability of the LMS algorithm, the latter requirement 
imposes a much stringent condition on the size of u. Furthermore, we may note that the 
independence assumption used to obtain Eq. (6.16) was based on the assumption that u 
was very small. The upper limit of u in Eq. (6.17) may badly violate this assumption. 


Thus, the validity of Eq. (6.17), even for the convergence of E[w(n)], is questionable. 


max 


Example 6.1 


Consider the modeling problem of Example 5.1 which is repeated in Figure 6.2, for 
convenience. As in Example 5.1, the input signal, x(n), is generated by passing a white 
noise signal, v(m), through a coloring filter with the system function 


V1—a2 


Hy = 
@) 1= gz! 


(6.18) 
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x(n) 


model 


Figure 6.2 A modeling problem. 


where a is a real-valued constant in the range of —1 to +1. The plant is a two-tap FIR 
system with the system function P(z) = 1 — 4z7!. An adaptive filter with the system 
function W(z) = wọ + w;z™! is used to identify the plant. Here, the LMS algorithm is 
used to find the optimum values of the tap weights wọ and w,. We want to see, as the 
iteration number increases, how the tap weights wọ and w, converge toward the plant 
coefficients, | and —4, respectively. We examine this for different values of the parameter 
a. We recall from Example 5.1 that the parameter œ controls the eigenvalue spread of the 
correlation matrix R of the input samples to the filter W(z). 

Figure 6.3a—d presents four plots showing typical trajectories of the LMS algorithm, 
which have been obtained for the values of a = 0, 0.5, 0.75, and 0.9, respectively. Also 
shown in the figures are the contour plots that highlight the performance surface of the 
filter. The results presented are for u = 0.01 and 150 iterations, for all cases. In com- 
parison with the parameters used in Figure 5.4 of Example 5.1, here ju is selected five 
times smaller, while the number of iterations is chosen five times larger. Comparing the 
results here with those of Figure 5.4, we can clearly see that, as predicted above, the LMS 
algorithm, on average, follows the same trajectories as the steepest-descent algorithm. In 
particular, the convergence of the LMS algorithm along the steepest-descent slope of the 
performance surface is clearly observed. Also, we note that in the case a = 0, which 
corresponds to a white input sequence, the convergence of the LMS algorithm is almost 
complete within 150 iterations. However, the other three cases require some more iter- 
ations before they converge to the vicinity of the minimum point of the performance 
surface. This, as was noted in Example 5.1, can be understood if one notes that the eigen- 
values of the correlation matrix R of the input samples to the adaptive filter are Ay = 1 + œ 
and à; = 1 — a, and for œ close to 1, the time constant t} = (1/42A,) may be very large. 


6.3 MSE Behavior of the LMS Algorithm 


In this section, the variation of €(n) = E[e?(n)] as LMS algorithm is being iterated is 
studied.! This study is directly related to the convergence of LMS algorithm. 


' The derivations provided in this section follow the work of Feuer and Weinstein (1985). Prior to Feuer and 
Weinstein (1985), Horowitz and Senne (1981) have also arrived at similar results, using a different approach. 
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(c) 


(d) 


Figure 6.3 Trajectories showing how the filter tap weights vary when the LMS algorithm is used: 
(a) a = 0, (b) a = 0.5, (c) œ = 0.75, and (d) a = 0.9. Each plot is based on 150 iterations and 


u = 0.01. 


In the derivations that follow, it is assumed that 


Ti 


the input, x(n), and desired output, d(n), are zero-mean stationary processes; 
x(n) and d(n) are jointly Gaussian-distributed random variables, for all n; and 


3. at time n, the tap-weight vector w(n) is independent of the input vector x(n) and the 


desired output d (n). 


The validity of the last assumption is justified for small values of the step-size parameter 
u, as was discussed in the previous section. This, as was noted before, is referred to as 
the independence assumption. Assumption 1 greatly simplifies the analysis. Assumption 2 
results in some simplification in the final results, as the third- and higher order moments, 
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which appear in the derivations, can be expressed in terms of the second-order moments 
when the underlying random variables are jointly Gaussian. 


6.3.1 Learning Curve 
We note from Eq. (6.11) that the estimation error, e(n), can be expressed as 
e(n) = e,(n) — v'(n)x(n) (6.19) 
Squaring both sides of Eq. (6.19) and taking the expectation on both sides, we obtain 
E[e*(n)] = Eles(n)] + ELV (n)x(n))"] — 2E[eg(n)v" (n)x(n)] (6.20) 


Noting that v'(n)x(n) = x'(n)v(n) and using the independence assumption, the second 
term on the right-hand side of Eq. (6.20) can be expanded as? 


ELVT (n)x(n))?] = Ely" (n)x(n)x" (n)v(n)] 
= Ev" (n)E[x(n)x' (n)]vm)] 
= E[v'(n)Rv(n)] (6.21) 
Noting that E[(v'(n)x(n))*] is a scalar and using Eq. (6.21), we may also write 
El(v' (n)x(n))7] = tl ELT (1) x(n)" I] 
= t[E[v' (n)Rv(n)}] 
= Eftr[v' (n)Rv(n)]] (6.22) 


where tr[-] denotes the trace of a matrix, and in writing the last identity we have noted 
that “trace” and “expectation” are linear operators and, thus, could be exchanged. This 
result can be further simplified using the following result from matrix algebra. For any 
pair of N-by-M and M-by-N matrices A and B, 


tr[AB] = tr[BA] (6.23) 
Using this identity, we obtain 
Eltr[v' ()Rv(n)]] = E[tr[v(n)v" mR] 
= tr[E[v(n)v' (n)]R] (6.24) 
Defining the correlation matrix of the weight-error vector v(n) as 
K(n) = Efv(n)v' (n)] (6.25) 


2 We note that when x and y are two independent random variables 
Elxy] = E[xJEly] = ELE Ly] 


Also, 
E[x?y"] = EDC JE[y?] = Elx Ely] = ElxELy* Ix] 


Similar procedure is used to arrive at Eq. (6.21) and in other similar derivations that appear in the rest of this book. 
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the above result reduces to 
E[(v' (n)x(n))*] = tr[K(7) R] (6.26) 


Using the independence assumption and noting that e,(”) is a scalar, the last term on the 
right-hand side of Eq. (6.20) can be written as 


E[e,(n)v' (n)x(n)] = E[v'(n)x(n)e,(n)] 
= Efv' (n)JE[x(n)e,(n)] 
=0 (6.27) 


where the last step follows from the principle of orthogonality, which states that the 
optimal estimation error and the input data samples to a Wiener filter are orthogonal 
(uncorrelated), that is, E[e,(n)x(n)] = 0. 

Using Eqs. (6.26) and (6.27) in Eq. (6.20), we obtain 


E(n) = Efe?(n)] = Emin + tr[K(n)R] (6.28) 


where Enin = E [e2(n)], that is, the minimum mean-squared error (MSE) at the filter 
output. 

This result may be written in a more convenient form for our further analysis later, if 
we recall from Chapter 4 that the correlation matrix R may be decomposed as 


R = QAQ" (6.29) 


where Q is the N-by-N matrix whose columns are the eigenvectors of R, and A is 
the diagonal matrix consisting of the eigenvalues Ag, Ay, ..., Ay_, Of R. Substituting 
Eq. (6.29) in Eq. (6.28) and using the identity (6.23), we obtain 


E(n) = Enin + tr[K’(n) A] (6.30) 


where K’(n) = Q'K(n)Q. Furthermore, using Eq. (6.25), and recalling the definition 
v’(n) = Q'v(n), from Chapter 4, we find that 


K'(n) = E[v' (mv (n)] (6.31) 


Also, we recall that v’(7) is the weight-error vector in the coordinates defined by the basis 
vectors specified by the eigenvectors of R. 
Noting that A is a diagonal matrix, Eq. (6.30) can be expanded as 


N-1 


E(n) = Emin + >) Aiki) (6.32) 


i=0 


where k; (n) is the ijth element of the matrix K’(n). 

The plot (n) versus the time index n, defined by Eq. (6.28) or its alternative forms 
in Eq. (6.30) or Eq. (6.32), is called the learning curve of the LMS algorithm. It is very 
similar to the learning curve of the steepest-descent algorithm because according to the 
derivations in the previous section, the LMS algorithm on average follows the same 
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trajectory as the steepest-descent algorithm. The noisy variations of the filter tap weights 
in the case of LMS algorithm introduce some additional error and push up its learning 
curve compared to that of the steepest-descent algorithm. However, when the step-size 
parameter, u, is small (which is usually the case in practice), one finds that the difference 
between the two curves is noticeable only when they have converged and approached 
their steady state. The following example shows this. 


Example 6.2 


Figure 6.4 shows the learning curves of the LMS algorithm and the steepest-descent 
algorithm for the modeling problem discussed in Examples 5.1 and 6.1, when «œ = 0.75 
and u = 0.01. For both cases, the filter tap weights have been initialized with wọ(0) = 
w,(0) = 0. The learning curve of the steepest-descent algorithm has been obtained by 
inserting the numerical values of the parameters in Eq. (5.31). The learning curve of the 
LMS algorithm is obtained by an ensemble average of the sequence e? (n) over 1000 inde- 
pendent runs. We note that the two curves match closely. The learning curve of the LMS 
algorithm remains slightly above the learning curve of the steepest-descent algorithm. 
This is because of the use of noisy estimates of the gradient vector in the LMS algorithm. 


We shall emphasize that, despite the noisy variation of the filter tap weights, the learning 
curve of the LMS algorithm matches closely with the theoretical results of the steepest- 
descent algorithm. In particular, Eq. (5.31) is applicable and the time constant equation 

1 
= —— 6.33 
T= ai iL (6.33) 


can be used for predicting the transient behavior of the LMS algorithm. 


10 r 1 
LMS algorithm 
- = — — Steepest—descent algorithm 
Wo 
wo H 
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Figure 6.4 Learning curves of the steepest-descent algorithm and LMS algorithm for the modeling 
problem of Figure 6.2 and the parameter values of æ = 0.75 and u = 0.01. 
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6.3.2 Weight-Error Correlation Matrix 


The weight-error correlation matrix K(n) plays an important role in the study of the 
LMS algorithm. From Eq. (6.28), we note that the value of &(n) is directly related to 
K(n). Equation (6.28) implies that the stability of the LMS algorithm is guaranteed if, 
as n increases, the elements of K(n) remain bounded. Also, from Eqs. (6.30) and (6.32), 
we note that K'(n) may equivalently be used in the study of the convergence of LMS 
algorithm. Here, we develop a time-update equation for K’(n). 

Multiplying both sides of Eq. (6.13) from the left by QT, using the definitions v'(n) = 
Q'v(n) and x'(n) = Q' x(n), and rearranging the result, we obtain 


v (n+ 1) = A — 2x’ (nx (n))v'(n) + 2ue,(n)x' (n) (6.34) 


Next, we multiply both sides of Eq. (6.34) from the right by their respective transposes, 
take statistical expectation of the result and expand to obtain 


K'(n + 1) = K' (n) — 2nE[x' (mx T nv (n)v" (n)] 

-2u E[v (n)v" (nx (n) x" (n)] 

+47 E[x'(n)x™ (n)v' (n)v' (nx (n)x" (n)] 

+2uEle,(n)x'(n)v'" (n)] 

+2uE[e, (n)v'(n)x’" (n)] 

—4y" Ele(n)x’ (n)v" (n)x’(n)x""(n)] 

—4u? Ele,(n)x’ (n)x'" (n)v' (nx (n)] 

+47 E[e2(n)x’ (nyx Tn). (6.35) 
We note that the independence assumption (which states that v(m) is independent of 
x(n) and d(n)) is also applicable to the transformed (prime) variables in Eq. (6.35). 
That is, the random vector v'(n) is independent of x'(n) and d(n). This is immedi- 
ately observed if we note that x’(n) and v'(n) are independently obtained from x(n) and 
v(n), respectively. Also, the assumption that d(n) and x(n) are zero-mean and mutually 
Gaussian-distributed implies that d(n) and x'(n) are also zero-mean and jointly Gaus- 


sian. Furthermore, using the definition x’/(n) = Q™x(n), we note that the principle of 
orthogonality, that is, E[e,(m)x(7)] = 0, may also be written as 


E[e,(n)x’(n)] = 0 (6.36) 


Noting this which shows that e,(m) and x(n) are uncorrelated, and the fact that d(n) 
and x’(n) and, thus, e,(n) and x’(n) are jointly Gaussian, one can say that the random 
variables e,(n) and x'(n) are independent of each other. Also, the independence of v'(n) 
from d(n) and x(n) implies that v’(n) and e,(n) are independent, since e,(n) depends 
only on d(n) and x(n). With these points in mind, the expectations on the right-hand side 


3 We recall that when random variables x and y are Gaussian and uncorrelated, they are also independent (Papoulis, 
1991). 
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of Eq. (6.35) can be simplified as follows: 
E[x'(n)x"" (nV (nV T (n)] = ERK (MXT (n) JEL (n)v" (n)] 
= AK'(n) (6.37) 
where we have noted that E[x’(n)x’/"(n)] = A. Similarly 
E[v (n)v" (n)x' (nyx T (n)] = K'(n) A (6.38) 


Simplification of the third expectation requires some algebraic manipulations. These are 
provided in Appendix 6A. The result is 


ERX (MXT (n)v' (n)v" (n)x’(n) x" (n)] = 2AK'(n)A + tr[AK’(n)]A (6.39) 


Using the independence of e (n), x’(n), and v'(n) and noting that e,(n) has zero mean, 
we get 
Eleg(n)x'(n)v" (n)] = Eleg(n)]E[x (nv (n)] = 0 (6.40) 


where 0 denotes the N-by-N zero matrix. Similarly, 


Efe,(n)v (n)x'"(n)] = 0 (6.41) 
Efe, (n)x’ (n)v'" (mx (nx T (n)] = 0 (6.42) 
Efe,(n)x’ (nx T mv (nx Tm] = 0 (6.43) 


and 
EJE X (nx T (n)] = Ele MJET (nx (n)] 
= Enn A (6.44) 
Substituting Eqs. (6.37)—(6.44) in Eq. (6.35), we obtain 
K’(n + 1) = K'(n) — 2u(AK' (n) + K'(n) A) 
+8u7AK'(n)A + 4y7tr[AK’(n)]A + 4U Enin A (6.45) 


The difference equation Eq. (6.45) is difficult to be handled. However, the fact that A is a 
diagonal matrix can be used to simplify the analysis. Consider the ith diagonal element of 
K’(n) and note that its corresponding time-update equation, obtained from Eq. (6.45), is 


N-1 
kin + 1) = pik n) + 47a; Y Ajk) + 4 WE mini (6.46) 
j=0 
where 
pi = 1 — 4d; + 8A; (6.47) 
and we have noted that N-i 
tr[AK’(n)] = 5 à ki; (n) (6.48) 


j=0 
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The important feature of Eq. (6.46) to be noted is that the update of ki; (n) is independent 
of the off-diagonal elements of K’(n). Furthermore, we note that as K’(n) is a correlation 
matrix, ki? (n) < ki. (n)k;, (n), for all values of i and j. This suggests that the convergence 
of the diagonal elements of K’(n) is sufficient to ensure the convergence of all elements 
of that, which, in turn, are required to guarantee the stability of the LMS algorithm. 
Thus, we concentrate on Eq. (6.46), fori = 0, 1,..., N — 1. 


Let us define the column vectors 


k(n) = [k (n) ki) ++ Ky na (6.49) 
and 
A = odie Aya] (6.50) 
and the matrix 
F = diagl po, p1» - -< Py) + 4u AAT (6.51) 
where diag[--- ] refers to a diagonal matrix consisting of the indicated elements. Con- 
sidering these definitions and the time-update equation (6.46), for i = 0, 1,...,N — 1l, 
we get 
k'(n + 1) = Fk (n) + 4U? Eminà (6.52) 


The difference equation (6.52) can be used to study the stability of the LMS algorithm. 
As was noted before, the stability of the LMS algorithm is guaranteed if the elements 
of K(n) (or, equivalently, the elements of k’(n)) remain bounded, as n increases. The 
necessary and sufficient condition for this to happen is that all the eigenvalues of the 
coefficient matrix F of Eq. (6.52) be less than 1, in magnitude. Feuer and Weinstein 
(1985) have discussed on the eigenvalues of F and given the condition required to keep 
the LMS algorithm stable. Here, we will comment on the stability of the LMS algorithm 
in an indirect way. This is done after we find an expression for the excess MSE of the 
LMS algorithm, which is defined in the following. 


6.3.3 Excess MSE and Misadjustment 


We note that even when the filter tap-weight vector w(n) approaches its optimal value, 
W,» and the mean of the stochastic gradient vector Ve*(n) tends to zero, the instantaneous 
value of this gradient may not be zero. This results in a perturbation of the tap-weight 
vector w(n) around its optimal value, w,, even after convergence of the algorithm. This, 
in turn, increases the MSE of the LMS algorithm to a level above the minimum MSE, 
which would be obtained if the filter tap weights were fixed at their optimal values. This 
additional error is called excess MSE. In other words, the excess MSE of an adaptive 
filter is defined as the difference between its steady-state MSE and its minimum MSE. 

The steady-state MSE of the LMS algorithm can be found from Eq. (6.28) or, equiv- 
alently, Eq. (6.30) or Eq. (6.32) by letting the time index n to tend to infinity. Thus, 
subtracting min from both sides of Eq. (6.28), we obtain 


min 


Excess = U[K(co)R] (6.53) 
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where &,,ces, denotes the excess MSE. Alternatively, if Eq. (6.32) is used, we get 
N-1 
a E 5 ikii (co) = Tk’ (00) (6.54) 
i=0 


When the LMS algorithm is convergent, k’(n) converges to a bounded steady-state 
value and we can say k’(n + 1) = k’(n), when n —> oo. Noting this, from Eq. (6.52), we 
obtain 

k’ (00) = 4U’ Emin A — F)! (6.55) 


Substituting this in Eq. (6.54), we get 
Ers = Au E minà A = F)'A (6.56) 


We note that &sxcess 18 proportional to min: This is intuitively understandable, if we note 
that when w(7) has converged to a vicinity of w,, the variance of the elements of the 
stochastic gradient vector Ve*(n) is proportional to Enin (Problem P6.1). We also note 
that similar to Emnin; €excess also has the units of power. It is convenient to normalize &sxcess 
to Emin; SO that a dimension-free degradation measure is obtained. The result is called 


misadjustment and denoted as M. For the LMS algorithm, from Eq. (6.56), we obtain 
M= excess L 4u’ ATA- FHA (6.57) 
min 


The special structure of the matrix (I — F) can be used to find its inverse. 
We note from Eq. (6.51) that 


=F = diag[1 — pọ, 1 — pj, ---, 1 — Py] — 4u an! (6.58) 


On the other hand, we note that according to the matrix inversion lemma, for an arbitrary 
positive-definite N-by-N matrix A, any N-by-1 vector a and a scalar a, 


aA'aat Aq! 
1 + g&aTA-!a 


(A + gaa")! = Aq! (6.59) 
Moreover, for our further reference later, in this book, we also recall that the general 
form of the matrix inversion lemma states: if A and B are positive-definite N-by-N and 
M-by-M matrices, respectively, and C is an arbitrary N-by-M matrix, then 


(A + CBCD! = A~! ace 7! + CTAC Ca (6.60) 


Letting A = diag[1 — pọ, 1 — p),...,1—py_)],a=A, and a = —4y? in Eq. (6.59) 
to obtain the inverse of (I — F), substituting the result in Eq. (6.57), and after some 
straightforward manipulations, we get 


N-1 
De MA /C — 2pà;) 
M= i=0 (6.61) 


N-1 
1— Ð ma /( = 2pa,) 
i=0 
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It is useful to simplify this result, by making some appropriate approximations, so that 
it can conveniently be used for the selection of the step-size parameter, jz. In practice, 
one usually selects jz, so that a misadjustment of 10% (M = 0.1) or less is achieved. In 
that case, we may find that 


LA; 
5 — i ayuy à = utr[R] (6.62) 


where the last equality is obtained from Eq. (4.24). This approximation is understood if 
we note that when M is small, the summation on the left-hand side of Eq. (6.62) is also 
small. Moreover, when the latter summation is small, uà; « 1, for i = 0, 1,..., N — 1, 
and, thus, these may be deleted from the denominators of the terms under the summation 
on the right-hand side of Eq. (6.62). Thus, we obtain 


utr[R] 


Furthermore, we note that when M is small, say M < 0.1, utr[R] is also small and, thus, 
it may be ignored in the denominator of Eq. (6.63), to obtain 


M ~ utr[R] (6.64) 


This is a very convenient equation, as tr[R] is equal to the sum of the powers of the signal 
samples at the filter tap inputs. This can be easily measured and used for the selection of 
the step-size parameter, m, for achieving a certain level of misadjustment. Furthermore, 
when the input process to the filter is nonstationary, tr[R] may be updated recursively and 
the step-size parameter, u, chosen accordingly to keep a certain level of misadjustment. 


6.3.4 Stability 


In Chapter 5, we noted that the steepest-descent algorithm remains stable only when its 
corresponding step-size parameter, u, takes a value between zero and an upper bound 
value, which was found to be dependent on the statistics of the filter input. The same is 
true for the LMS algorithm. However, the use of stochastic gradient in the LMS algorithm 
makes it more sensitive to the value of its step-size parameter, u, and, as a result, the 
upper bound of u, which can ensure a stable behavior of the LMS algorithm, is much 
lower than the corresponding bound in the case of the steepest-descent algorithm. To find 
the upper bound of which guarantees the stability of the LMS algorithm, we elaborate 
on the misadjustment equation (6.61). 


We define 
N-1 Pe 
= n 6.65 
a 2 on (6.65) 
and note that 
T 
M = —— (6.66) 
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We also note that a 


-e 


(6.67) 
Fls a Me 


From Eq. (6.67), we note that J is an increasing function of u, since its derivative 
with respect to u is always positive. In a similar way, one can show that M is an 
increasing function of J. This, in turn, implies that the misadjustment M of Eq. (6.61) 
is an increasing function of the step-size parameter, u. Thus, starting with u = 0 (i.e., 
the lower bound of jz) and increasing jz, we find that J and M also start from zero and 
increase with u. We also note that when J approaches unity, M tends to infinity. This 
clearly coincides with the upper bound of the step-size parameter, say Umax, below which 
u has to remain to ensure a stable behavior of the LMS algorithm. Thus, the value of 
max 1S obtained by finding the first positive root of the equation 


N-1 uà 
i y 6.68 
2 1 = 24å; ene) 


Finding the exact solution of this problem, in general, turns out to be a difficult mathe- 
matical task. Furthermore, from a practical point of view, such solution is not rewarding 
as it depends on the statistics of the filter input in a complicated way. Here, we give an 
upper bound of u, which depends only on aw T à; = tr[R]. This results in a smaller 
(more stringent) value as the upper bound of u, but a value that can easily be measured 
in practice. For this, we note that when 


O<u< (6.69) 
a =0 lA; 


the following inequality always holds: 


N-1 
> UÀi PEA 
o L> 2u; EF 


The proof of this inequality is discussed in Problem P6.4. From Eq. (6.70), we find that 
the value of u, which satisfies the equation 


(6.70) 


N-1 
_H dice Mi _ (6.71) 
N-I i 
1—24} izo Ai 
satisfies the inequality 
N-1 2. 
S l (6.72) 
o 1 — 2uì; 


Furthermore, any value of u that remains between zero and the solution of Eq. (6.71) 
satisfies Eq. (6.72). This means that Eq. (6.71) gives an upper bound for jz, which is 
sufficient for the stability of the LMS algorithm, but is not necessary, in general. If we 
call the solution of Eq. (6.71) Uhax, We obtain 


1 1 


aya, wR (6.73) 


1 
Umax — 
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To summarize, we found that, under the assumptions made at the beginning of this 
section, the LMS algorithm remains stable when 


<u< (6.74) 


3tr[R] 
The significance of the upper bound of u, which is provided by Eq. (6.74), is that it 
can easily be measured from the filter input samples. We also note that the range of ju, 
which is provided by Eq. (6.74), is sufficient for the stability of the LMS algorithm but is 
not necessary. The first positive root of Eq. (6.68) gives a more accurate upper bound of 
u. However, this depends on the filter input statistics in a very complicated way, which 
prohibits its applicability in actual practice. 


6.3.5 The Effect of Initial Values of Tap Weights on the Transient 
Behavior of the LMS Algorithm 


As was noted before, the LMS algorithm on average follows the same trajectory as the 
steepest-descent algorithm. As a result, the learning curves of the two algorithms are 
found to be similar when the same step-size parameter is used for both. In particular, the 
learning curve equation (5.31) is also (approximately) applicable to the LMS algorithm. 
Thus, we may write 


N-1 
E(n) © Emin + D> A; (1 — 2u)” v 0) (6.75) 
i=0 
In most applications, the filter tap weights are all initialized to zero. In that case, 


v(0) = w(0) — w, = —W, (6.76) 
Using this result and recalling the definition w’(0) = Q'Vv(0), we get 
v (0) = —w, (6.77) 
where wi, = Q'w,. Using Eq. (6.77) in Eq. (6.75), we obtain 
N-1 
E(n) © Emin + >) Ay (1 2a we; (6.78) 
i=0 


where w; is the ith element of wọ. 

The contribution of various modes of convergence of the LMS algorithm (i.e., the terms 
under the summation on the right-hand side of Eq. (6.75)) on its learning curve depends 
on the coefficients Ajwe,’s. As a result, one finds that even for a similar eigenvalue 
distribution, the convergence behavior of the LMS algorithm is application dependent. 
For instance, if the w/,,’s corresponding to the smaller eigenvalues of R are all close to 
zero, the transient aiai of the LMS algorithm is determined by the larger eigenvalues 
of R whose associated time constants are small; thus a fast convergence is observed. On 
the contrary, if the wf ,’s corresponding to the smaller eigenvalues of R are significantly 
large, one finds that the slower modes of the LMS algorithm are prominent on its learning 
curve. Examples given in the next section show that these two extreme cases can happen 
in practice. 
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6.4 Computer Simulations 


In the study of adaptive filters, computer simulation plays a major role. In the analysis 
that was presented in the previous section, we had to consider a number of assumptions 
to make the problem mathematically tractable. The validity of these assumptions and the 
matching between mathematical results and the actual performance of adaptive filters are 
usually verified through computer simulations. 

In this section, we present a few examples of computer simulations. We present 
examples of four different applications of adaptive filters: 


System modeling 

Channel equalization 

Adaptive line enhancement (this is an example of prediction) 
Beamforming. 


Our objectives in this presentation are: 


1. To help the novice readers to have a fast start in doing computer simulations. 

To check the accuracy of the developed theoretical results. 

3. To enhance the understanding of the theoretical results by careful observation and 
interpretation of simulation results. 


`” 


All the results, which are given in the following have been generated by using the MAT- 
LAB numerical package. The MATLAB programs used to generate the results presented 
in this section and other parts of this book are available on an accompanying website. A 
list of these programs (m-files as they are called in MATLAB) is given at the end of the 
book and also in the read.me file on the accompanying website. We encourage all the 
novice readers to try to run these programs, as this, we believe, is essential for a better 
understanding of the adaptive filtering concepts. 


6.4.1 System Modeling 


Consider a system modeling problem, as depicted in Figure 6.5. The filter input is obtained 
by passing a unit variance white Gaussian sequence, v (n), through a filter with the system 
function H(z). The plant, W,(z), is assumed to be a FIR system with the impulse response 
duration of N samples. The plant output is contaminated with an additive white Gaussian 
noise sequence, e (n), with variance oè. An N-tap adaptive filter, W (z), is used to estimate 
the plant parameters. 

For simulations, in this section, we select N = 15, ò = 0.001 and 


7 14 
W,(z) = a — D g= (6.79) 
i=0 i=8 
We present results of simulations for two choices of input, which are characterized by 


H(z) = A,(z) = 0.35 +27! — 0.3577? (6.80) 
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Figure 6.5 Adaptive modeling of an FIR plant. 


and 


H(z) = H(z) = 0.35 +27! + 0.3577? 


(6.81) 


The first choice results in an input, x(n), whose corresponding correlation matrix has an 
eigenvalue spread of 1.45. This is close to a white input. On the contrary, the second 
choice of H(z) results is a highly colored input with an associated eigenvalue spread 
of 28.7. From the results of Chapter 4, we recall that the eigenvalue spread figures can 
approximately be obtained from the underlying power spectral densities. Figure 6.6 shows 
the power spectral densities of the two inputs generated using the filters H; (z) and H(z). 


These plots are obtained by noting that 


P (e12) = D, (e/”)|H(e!”)/? 
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(6.82) 


Figure 6.6 Power spectral densities of the two input processes used for the simulation of the 


modeling problem: (a) H(z) = H(z) and (b) H(z) = H(z). 
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and ®,,,(e/”) = 1, since v(n) is a unit variance white noise process. The fact that H, (z) 
generates a process that is highly colored, while the process generated by H(z) is rela- 
tively flat, is clearly seen. 

Figure 6.7a and b shows the learning curves of the LMS algorithm for the two choices 
of H(z). The step-size parameter, jz, is selected according to the simplified misadjustment 
equation (6.64) for the misadjustment values 10%, 20%, and 30%. The filter tap weights 
are initialized to zero. Each plot is obtained by an ensemble average of 100 independent 
simulation runs. We note that &,i, = oé = 0.001, and this is achieved when the model 
and plant coefficients match. Careful examination of the results presented in Figure 6.7a 
and b reveals that the predictions made by Eq. (6.64) are accurate for the cases where 
u is set for a misadjustment of 10% (or less). For larger values of jz, one finds that a 
more accurate theoretical estimate of the misadjustment is obtained using Eq. (6.61). Such 
estimate, of course, requires calculation of the eigenvalues of the correlation matrix R. 
The MATLAB program “modeling.m’” on the accompanying website contains instruc- 
tions, which generate matrix R and the other parameters required for these calculations. 
The reader is encouraged to use this program and experiment with that to examine the 
effect of various parameters, such as the step-size, u, the plant model, W,(z), and the 
input sequence to the adaptive filter. Such experiments will greatly enhance the reader’s 
understanding of the concepts of convergence and misadjustment. 

Experiments with the LMS algorithm show that the accuracy of the misadjustment 
equations developed above varies with the statistics of the filter input and the step-size 
parameter. For example, one finds that all of the three plots in Figure 6.7a and two of the 
plots in Figure 6.7b match the theoretical predictions made by Eq. (6.61), but the third 
plot in Figure 6.7b (i.e., the case M = 30%) does not match Eq. (6.61). In the latter 
case, the LMS algorithm experiences some instability problem. The mismatch between 
the theory and experiments here is attributed to the fact that the independence assumption 
made in the development of the theoretical results is badly violated for larger values of ju. 


6.4.2 Channel Equalization 


Figure 6.8 depicts a channel equalization problem. The input sequence to the channel 
is assumed to be binary (taking values of +1 and —1) and white. The channel system 
function is denoted by H(z). The channel noise, v.(7), is modeled as an additive white 
Gaussian process with variance a The equalizer is implemented as an N-tap transversal 
filter. The desired output of the equalizer is assumed to be s(n — A), that is, a delayed 
replica of the transmitted data symbols. For the training of the equalizer, it is assumed 
that the transmitted data symbols are available at the receiver. This is called training 
mode. Once the equalizer is trained and switched to the data mode, its output, after 
passing through a slicer, gives the transmitted symbols. A discussion on the training and 
data modes of equalizers can be found in Chapter 1. More detailed explanations and 
adaptation algorithms are presented in Chapter 17. 

Two choices of the channel response, H(z), are considered for our study, here. These 
are purposefully selected to be the same as the two choices of H(z) in the modeling 
problem, above, where H(z) was used to shape the power spectral density of the input 
process to the plant and model. This facilitates a comparison of the results in the two 
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Figure 6.7 Learning curves of the LMS algorithm for the modeling problem of Figure 6.5, for 
the two input processes discussed in the text: (a) H(z) = H,(z) and (b) H(z) = H,(z). The step- 
size parameter, u, is selected for the misadjustment values 10%, 20%, and 30%, according to the 
simplified equation (6.64). 
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Figure 6.8 Adaptive channel equalization. 


cases. In particular, we note that, in the present problem, 
Pa (E?) = Pos (EH (E)? + Day, (E7) 
= |H (ei)? + o? (6.83) 


Comparing Eqs. (6.82) and (6.83), we note that when a similar H(z) is used for both 
cases and the signal-to-noise ratio at the channel output is high (i.e., a is small), 
the power spectral densities of the input samples to the two adaptive filters are almost 
the same. This, in turn, implies that the convergence of both the filters is controlled 
by the same set of eigenvalues. As a result, on average, one may expect to see similar 
learning curves for both cases. 

Figure 6.9a and b presents the learning curves of the equalizer for the two choices of 
the channel response, that is, H,(z) and H,(z) of Eqs. (6.80) and (6.81), respectively. 
The equalizer length, N, and the delay, A, are set equal to 15 and 9, respectively. The 
step-size parameter, u, is chosen according to the simplified equation (6.64) for the three 
misadjustment values 10%, 20%, and 30%. The equalizer tap weights are initialized to 
zero. Each plot is based on an ensemble average of 100 independent simulation runs. The 
MATLAB program used to obtain these results is available on the accompanied website. 
It is called “equalizer.m.” Careful study of Figure 6.9a and b and further numerical 
tests (using the “equalizer.m’” or any similar simulation program) reveal that similar 
to the modeling case, the theoretical and simulation results match well when the step- 
size parameter, u, is small. However, the accuracy of the theoretical results is lost for 
larger values of u. The latter effect is more noticeable when the eigenvalue spread of the 
correlation matrix R is large. 

Comparing the results presented in Figures 6.7a and 6.9a, we find that the performance 
of the adaptive filters in both cases are about the same. Moreover, these results compare 
very well with the predictions made by theory. We recall that these correspond to the 
case where the eigenvalue spread of the correlation matrix R is small. Some differences 
between the results of the two cases are observed as the eigenvalue spread of R increases. 
In particular, a comparison of Figures 6.7b and 6.9b shows that the learning curve of the 
channel equalizer is dominantly controlled by its slower modes of convergence, while 
in the modeling case, a balance of slow and fast modes of convergence is observed. 
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Figure 6.9 Learning curves of the LMS algorithm for the channel equalizer, for the two choices 
of channel responses discussed in the text: (a) H(z) = H(z) and (b) H(z) = H,(z). The step- 
size parameter, u, is selected for the misadjustment values 10%, 20%, and 30%, according to the 
simplified equation (6.64). 
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In the latter case, a drop of MSE from 10 to 0.1 within the first 100 iterations of the 
LMS algorithm is observed. The slower modes of the algorithm are observed after the 
filter output MSE has dropped to a relatively low level. As a result, the existence of 
slow and fast modes of convergence on the learning curve is clearly visible. On the 
contrary, in the case of channel equalization, we find that the convergence of the LMS 
algorithm is dominantly determined by its slower modes. We can hardly see any fast 
mode of convergence on the learning curves presented in Figure 6.9b. An explanation of 
this phenomenon, which is usually observed when the LMS algorithm is used to adapt 
channel equalizers, is instructive. 

As was noted before, besides the eigenvalue spread of R, the transient behavior of 
the LMS algorithm is also affected by the initial offset of the filter tap weights from 
their optimal values; see Eq. (6.75). We also noted that when the filter tap weights are 
initialized to zero, the transient behavior of the LMS algorithm is affected by the optimum 
tap weights of the filter; see Eq. (6.78). To be more precise, the contribution of various 
modes of convergence of the LMS pai in shaping its learning curve is determined 
by the values of awe, fori =0,1,...,N—1. 

For a modeling problem, the statistics of the filter input and its optimum tap weights, 
Wọ, (i.e., the plant response) are, in general, independent of each other. In this situation, 
it is hard to make any comment on the values of À; w, terms. The only comment which 
may be made is that if one assumes the statistics of the filter input are fixed and the plant 
response is arbitrary, the elements of w/, that is, Wi ’s, may be thought of as a set of zero- 
mean random variables whee values change from one plant to another, and they all have 
the same variance, say ož ,. Using this in Eq. (6.78), we obtain, for the modeling problem, 


N-I 


ELEMI © Emin + Oar JO Al — 2md,)™ (6.84) 
i=0 


where the statistical expectation on (n) is with respect to the variations of Wo; 'S> that 
is, the plant response. 

On the contrary, in the case of channel equalization, there is a close relationship between 
the filter (equalizer) input statistics and the optimum setting of its tap weights. The 
equalizer is adapted to implement the inverse of the channel response, that is, 


z A 


A(z) 


Wz) © (6.85) 


This result, which may be referred to as spectral inversion property of channel equalizer, 
can be used to evaluate À; we, en when the equalizer length is relatively long. A 
procedure for approximation of Aw? oi 18 discussed in Problem P6.14. The result there is 
that when the equalizer length N is Telatively long 


1 
Ajwor © 57> for i=0,1,...,.N—-1 (6.86) 
Substituting this in Eq. (6.78), we get, for an N-tap channel equalizer, 
N-1 


1 n 
E) © Emin + 7 DL uA)? (6.87) 


i=0 
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The difference between the learning curves of the modeling and channel equalization 
problems may now be explained by comparing Eqs. (6.84) and (6.87). When the eigen- 
values Ag, Ay, ..-, Ay—, are widely spread and n is small (1.e., the adaptation has just 
started), the summation on the right-hand side of Eq. (6.84) is dominantly determined by 
the larger A,’s. However, noting that the geometrical regressor factors, (1 — 21d;)?”’s, 
corresponding to the larger A,’s converge to zero at a relatively fast rate, the summation 
on the right-hand side of Eq. (6.84) experiences a fast drop to a level significantly below 
its initial value, when n = 0. The slower modes of the LMS algorithm are observed after 
this initial fast drop of the MSE. This, of course, is what we observe in Figure 6.7b. 
In the case of channel equalizer, we note that when n is small, all the terms under the 
summation on the right-hand side of Eq. (6.87) are about the same. This means there is 
no dominant term in the latter summation and as a result, unlike the modeling problem 
case, the convergence of the faster modes of the LMS algorithm may not reduce &(n) 
significantly. A significant reduction of (n) after convergence of the faster modes of the 
LMS algorithm may only be observed when the filter length, N, is large and only a few 
of the eigenvalues of R are small. 


6.4.3 Adaptive Line Enhancement 


Adaptive line enhancement refers to the case where a noisy signal consisting of a few 
sinusoidal components is available and the aim is to filter out the noise part of the signal. 
The filtering solution to this problem is trivial. The noisy signal is passed through a filter, 
which is tuned to the sinusoidal components. When the frequency of the sine-waves 
present in the noisy signal is known, of course, a fixed filter will suffice. However, when 
the sine-wave frequencies are unknown or may be time-varying, an adaptive solution has 
to be adopted. 

Figure 6.10 depicts the block schematic of an adaptive line enhancer. It is basically an 
M-step-ahead predictor. The assumption is that the noise samples, which are more than 
M samples apart, are uncorrelated with one another. As a result, the predictor can only 
make a prediction of the sinusoidal components of the input signal and when adapted 
to minimize the output MSE, the line enhancer will be a filter tuned to the sinusoidal 
components. The maximum possible rejection of the noise will also be achieved as any 
portion of the noise, which passes through the prediction filter, will enhance the output 
MSE whose minimization is the criterion in adapting the filter tap weights. 


x(n) 


Figure 6.10 Adaptive line enhancer. 
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Here, to simplify our discussion, we assume that the enhancer input consists of a single 
sinusoidal component and the additive noise is white. More specifically, we assume that 


x(n) = asin(@,n + 0) + v(n) (6.88) 


where v(n) is a white noise sequence. The delay parameter M is set to 1 as v(m) is white. 

Figure 6.11 shows the learning curves of the adaptive line enhancer when x(n) is chosen 
as in Eq. (6.88). The following parameters are used to obtain these results: N = 30, M = 
1,a = 1, œ, = 0.1, and @ is chosen to be a random variable with constant distribution in 
the range of 0 to 27, for different simulation runs. The variance of v(m) is chosen 10 dB 
below the sinusoidal signal energy. The learning curves are given for three choices of the 
step-size parameter, u, which result in 1%, 5%, and 10% misadjustment. The predictor 
tap weights are initialized to zero. The program used to obtain these results is available 
on the accompanying website. It is called “lenhncr.m.” 

From the results presented in Figure 6.11, it appears that the convergence of the 
line enhancer is governed by only one mode. Examination of the eigenvalues of the 
underlying process and the resulting time constants of the various modes of the line 
enhancer reveals that the mode which is observed in Figure 6.11 coincides with the 
fastest convergence mode of the LMS algorithm in the present case. An explanation of 
this phenomenon is instructive. 

We note that the optimized predictor of the line enhancer is a filter tuned to the peak of 
the spectrum of x(n). Furthermore, from the minimax theorem (of Chapter 4), we may say 
that the latter is the eigenfilter associated with the maximum eigenvalue of the correlation 
matrix R of the underlying process. This implies that the optimum tap-weight vector of the 
line enhancer coincides with the eigenvector associated with the largest eigenvalue of its 
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Figure 6.11 Learning curves of the adaptive line enhancer. The line enhancer MSE is normalized 
to the input signal power. 
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corresponding correlation matrix. In other words, in the Euclidean space associated with 
the tap weights of the line enhancer, the line connecting the origin to the point defined by 
the optimized tap weights is along the eigenvector associated with largest eigenvalue of 
its corresponding correlation matrix. This clearly explains why the learning curves of the 
line enhancer presented in Figure 6.11 are dominantly controlled by only one mode and 
this coincides with the fastest mode of convergence of the corresponding LMS algorithm. 


6.4.4 Beamforming 


Consider a two-element antenna array similar to the one discussed in Example 3.6. The 
array consists of two omnidirectional (equally sensitive to all directions) antennas A and 
B, as shown in Figure 6.12. A desired signal s(n) = a(n) cos(nw, + $) arrives in the 
direction perpendicular to the line connecting A and B. An interferer (jammer) signal 
v(n) = B(n) cos(nw, + %2) arrives at an angle 6, relative to s(n). The signal sequences 
s(n) and v(n) are assumed to be narrow-band processes with random phases ¢, and ¢,, 
respectively. It is also assumed that the random amplitudes a(n) and 6(n) are zero-mean 
and uncorrelated with each other. The two omnis are separated by a distance of l = 4/2 
meters, where A, is the wavelength associated with the continuous time carrier frequency 
Wo 

o.= = 


T 
with T being the sampling period. The coefficients, wọ and w,, of the beamformer are 
adjusted, so that the output error, e(n), is minimized in the mean-square sense. 
As in Example 3.6, the adaptive beamformer of Figure 6.12 is characterized by the 
following signal sequences*: 


(6.89) 


1. Primary input 
d(n) = a(n) cos(nw, + $1) + B(n) cos(nw, + p2 — po) (6.90) 


2. Reference tap-input vector 


= e cos(nw, + $1) + B(n) cos(nw, + a] (6.91) 


a(n) sin(nw, + $1) + B(n) sin(n@, + ¢2) 


The phase shift ø, is introduced because of the difference between the arrival time of the 
jammer at A and B. It is given by 


lsin, 
Oo w, 


c (6.92) 


c 
where c is the propagation speed. Replacing / with A,./2 in Eq. (6.92) and noting that 
w/c = 27/h,, we obtain 

o =m sin 6, (6.93) 


We note that, as expected, @, is independent of the sampling period T. It depends only 
on the angle of arrival of the jammer signal, 0,. 


4Tn Example 3.6, to simplify the derivations, $; and ġ, were assumed to be zero. 
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Figure 6.12 A two-element antenna array. 


The beamformer coefficients, wọ and w,, are selected (adapted), so that the difference 
e(n) = d(n) — w'x(n) 


where w = [wọ al, is minimized in the mean-square sense. The error signal e(n) is the 
beamformer output. 

For a given set of the beamformer coefficients wọ and w; and a signal arriving at an 
angle 6, the array power gain, G(@), is defined as the ratio of the signal power in the 
output e(n) to the signal power at one of the omnis. Assuming that a narrow-band signal 
y(n) cos nw, is arriving at an angle 0, 


e(n) = y(n)[cos(nw, — 7 sin@) — wy cos nw, — w; sin nw,] 
= y(n)[(cos(z sin) — wọ) cos nw, + (sin(x sin) — w,) sin nwo] 
= a(0)y (n) sin(nw, + 9@)) (6.94) 


where 


a(0) = [costr sin) — wọ)? + (sinr sin 0) — w,)? 


and 


cos(x sin ð) — wo 
sin(x sin@) — wy, 


(0) = tan”! ( 
Using these, we get 
G(0) = a° (0) = (cos(x sin 0) — wọ)? + (sin(x sin@) — w)? (6.95) 


G(0) when plotted against the angle of arrival of the received signal is called directivity 
pattern of the array (beamformer). The names beam pattern, array pattern, and spatial 
response are also used to refer to G(@). The directivity patterns are usually plotted in 
polar coordinates. 

Figure 6.13 shows the directivity pattern of the two-element beamformer of Figure 6.12 
when its coefficients have been adjusted near their optimal values using the LMS 
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270 


Figure 6.13 The directivity pattern of the two-element antenna array when a jammer arrives from 
the direction 45 with respect to the desired signal, as defined in Figure 6.12. 


algorithm. The following parameters have been used to obtain these results: 
0 =45, of =0.01, of =1 


where oÈ and o are the variances of a(n) and B(n), respectively. The results, as could 
be predicted from the theory, show a clear deep null in the direction that the jammer 
arrives (0 = @,) and a reasonably good gain in the direction of the desired signal (6 = 0). 
The array pattern is symmetrical with respect to the line connecting A to B because of 
the omnidirectional properties of the antennas. The MATLAB program used to obtain 
this result is available on the accompanying website. It is called “bformer.m.” We 
encourage the readers to try this program for different values of 6,, c, and oĉ. An 
interesting observation that can be made is that a null is always produced in the direction 
of arrival of the desired signal or jammer, whichever is stronger. The theoretical results 
related to these observations can be found in Chapter 3, Section 3.6.5. The subject of 
beamforming is presented in great details in Chapter 18, under the more generic name 
sensor array processing. 


6.5 Simplified LMS Algorithms 


Over the years, a number of modifications, which simplify hardware implementation 
of the LMS algorithm, have been proposed (Hirsch and Wolf (1970); Claasen and 
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Mecklenbrauker (1981), and Duttweiler (1982)). These simplifications are discussed in 
this section. The most important members of this class of algorithms are: 


Sign algorithm. This algorithm is obtained from the conventional LMS recursion (6.9) 
by replacing e(n) with its sign. This leads to the following recursion 


w(n + 1) = w(n) + 2usign(e(n))x(n) (6.96) 


Because of replacement of e(n) by its sign, implementation of this recursion may be 
cheaper than the conventional LMS recursion, especially in high-speed applications 
where a hardware implementation of the adaptation recursion may be necessary. Fur- 
thermore, the step-size parameter is usually selected to be a power-of-two, so that no 
multiplication would be required for implementing the recursion (6.96). A set of shift 
and add/subtract operations would suffice to update the filter tap weights. 

Signed-Regressor algorithm. The signed-regressor algorithm is obtained from the con- 
ventional LMS recursion (6.9) by replacing the tap-input vector x(n) with the vector 
sign(x(n)), where the sign function is applied to the vector x(n) on element-by-element 
basis. The signed-regressor recursion is then 


w(n + 1) = w(n) + 2ue(n)sign(x(n)) (6.97) 


Although, quite similar in form, the signed-regressor algorithm performs much better 
than the sign algorithm. This will be shown later through a simulation example. 
Sign-Sign algorithm. The sign—sign algorithm, as may be understood from its name, 
combines the sign and signed-regressor recursions together, resulting in the following 
recursion: 
w(n + 1) = w(n) + 2usign(e(n))sign(x(7)) (6.98) 


It may be noted that even though in many practical cases, all of the above algorithms 
are likely to converge to the optimum Wiener—Hopf solution, this may not be true in 
general. For example, the sign—sign algorithm converges toward a set of tap weights, 
which satisfy the equation 

E[sign(e(n)x(n))] = 0 (6.99) 


which in general may not be equivalent to the principle of orthogonality 
E[e(n)x(n)] = 0 (6.100) 


which leads to the Wiener—Hopf equation. For instance, when the elements of the vector 
x(n) are zero-mean but have a nonsymmetrical distribution around zero, the elements 
of e(n)x(n) may also have a nonsymmetrical distribution around zero. In that case, it is 
likely that the solutions to Eqs. (6.99) and (6.100) lead to two different set of tap weights. 
Nevertheless, we shall emphasize that in most of the practical applications, the scenario 
that was just mentioned is unlikely to happen. Even if it happens, the solutions obtained 
from Eqs. (6.99) and (6.100) are usually about the same. 

To compare the performance of these algorithms, with the conventional LMS algo- 
rithm and among themselves, we run the system modeling problem that was introduced 
in Section 6.4.1. Figure 6.14 shows the convergence behavior of the algorithms when 


LMS Algorithm 169 


10 r 1 T 
Conventional LMS 
40! PP pee — — — — Signed Regressor | 
MA e — Sign 
` see 
P D a E Sign-Sign 


0 0.5 1 1.5 2 2.5 3 3.5 4 
NO. OF ITERATIONS 


Figure 6.14 Learning curves of the conventional LMS algorithm and its simplified versions. Dif- 
ferent step-size parameters are used. These have been selected experimentally, so that all algorithms 
approach the same steady-state MSE. 


the input coloring filter H(z) = H,(z) is used and the step-size parameters for different 
algorithms are selected experimentally, so that they all reach the same steady-state MSE. 

From the results presented in Figure 6.14, we see that the performance of the 
signed-regressor algorithm is only slightly worse than the conventional LMS algorithm. 
However, the sign and sign—sign algorithms are both much slower than the conventional 
LMS algorithm. The convergence behavior of them is also rather peculiar. They converge 
very slowly at the beginning but speed up as the MSE level drops. This can be explained 
as follows. 

Consider the sign algorithm recursion and note that it may be written as 


wr yawison e (6.101) 
le(n)| 
as sign(e(n)) = e(n)/|e(n)|. This may be rearranged as 
wn +1) = win) +2" en) x(n) (6.102) 
le(n)| 


Inspection of Eq. (6.102) reveals that the sign algorithm may be thought as an LMS 
algorithm with a variable step-size parameter ju’(n) = 4/|e(n)|. The step-size parameter 
u'(n) increases, on an average, as the sign algorithm converges as e(n) decreases in 
magnitude. Thus, to keep the sign algorithm stable, with a small steady-state error, a 
very small step-size parameter u has to be used. Choosing a very small u leads to an 
equally small value (on an average) for j’(n) in the initial portion of the sign algorithm. 
This clearly explains why the sign algorithm initially converges very slowly. However, as 
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the algorithm converges and e(n) becomes smaller in magnitude, the step-size parameter 
u(n) becomes larger, on an average, and this, of course, leads to a faster convergence 
of the algorithm. A rigorous analysis of the sign algorithm for a nonstationary case can 
be found in Eweda (1990b). 

The same procedure may be followed to explain the behavior of the signed-regressor 
algorithm. In this case, each tap of the filter is controlled by a separate variable step- 
size parameter. In particular, the step-size parameter of the ith tap of the filter at nth 
iteration is u; (n) = u/|x(n — i)|, where u is a common parameter to all taps. The funda- 
mental difference between the variable step-size parameters, jz; (n)’s, here and what was 
observed above for the sign algorithm is that in the present case, the variations of ju(n)’s 
are independent of the filter convergence. The selection of the common parameter u is 
based on the average size of |x(n)|. This leads to a more homogeneous convergence of 
the signed-regressor algorithm when compared with the sign algorithm. In fact, the anal- 
ysis of the signed-regressor algorithm given by Eweda (1990a) shows that for Gaussian 
signals, the convergence behavior of the signed-regressor algorithm is very similar to the 
conventional LMS algorithm. The replacement of x(n — i) terms by their signs leads to 
an increase of the time constants of the algorithm learning curve by a fixed factor of 2/2. 
This, clearly, increases the convergence time of the signed-regressor algorithm by the 
same factor when it is compared with the conventional LMS algorithm. Problem P6.16 
contains the necessary theoretical elements, which lead to this result. 

Another interesting proposal, which also leads to some simplification of the LMS 
algorithm, was suggested by Duttweiler (1982). He suggested that in calculating the 
gradient vector e(n)x(n), e(n), and/or x(n) may be quantized to their respective nearest 
power-of-two. This leads to an algorithm that performs very similar to the conventional 
LMS algorithm. 


6.6 Normalized LMS Algorithm 


Normalized LMS (NLMS) algorithm may be viewed as a special implementation of the 
LMS algorithm that takes into account the variation of the signal level at the filter input 
and selects a normalized step-size parameter, which results in a stable as well as fast 
converging adaptation algorithm. The NLMS algorithm may be developed from different 
viewpoints. Goodwin and Sin (1984) formulated the NLMS algorithm as a constrained 
optimization problem; see also Haykin (1991). Nitzberg (1985) obtained the NLMS recur- 
sion by running the conventional LMS algorithm many times, for every new sample of 
the input. Here, we start with a rather straightforward derivation of the NLMS recursion 
and later show that the recursion obtained satisfies the constrained optimization criterion 
of Goodwin and Sin and also that it matches the result of Nitzberg. 
We consider the LMS recursion 


win + 1) = w(n) + 2u(n)e(n)x(n) (6.103) 


where the step-size parameter u(n) is time-varying. We select u(n), so that the a posteriori 
error 


et(n) = d(n) — w'(n + 1)x(n) (6.104) 
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is minimized in magnitude. Substituting Eq. (6.103) in Eq. (6.104) and rearranging, we 
obtain 


et (n) = (1 — 2u(n)x! (n)x(n))e(n) (6.105) 
Minimizing (e* (n))* with respect to u(n) results in the following: 
1 

a 6.106 
u(n) GEG) ( ) 

which forces et (n) to zero. Substituting Eq. (6.106) in Eq. (6.103), we obtain 

1 
win + 1) = wn) + Sy O(N) x (2) (6.107) 
x (n)x(n) 


This is the NLMS recursion. When this is combined with the filtering equation (6.1) and 
the error estimation equation (6.2), we obtain the NLMS algorithm. 

There have been a variety of interpretations to the NLMS algorithm. We review some 
of these in the following as it can help in enhancing our understanding of this algorithm. 


1. The use of u(n) as in Eq. (6.106) is appealing as it selects a step-size parameter 
proportional to the inverse of the instantaneous signal samples energy at the adaptive 
filter input. This matches the misadjustment equation (6.64), which suggests that the 
step-size parameter of the LMS algorithm should be selected proportional to the inverse 
of the average total energy at the filter tap inputs. Note that 


N-1 N-1 
tr[R] = 5 Elx? (n —- D] = E > x(n — | 
i=0 i=0 


and y T x?(n — i) is the total instantaneous signal energy at the filter tap inputs. 

2. The NLMS recursion (6.107) is equivalent to running the LMS recursion for every new 

sample of input many iterations until it converges (Nitzberg, 1985); see Problem P6.17. 

3. The NLMS recursion may also be derived by solving the following constrained opti- 
mization problem (Goodwin and Sin, 1984): 


Given the tap-input vector x(n) and the desired output sample d(n), choose the updated 
tap-weight vector w(n +1) so as to minimize the squared Euclidean norm of the 
difference 


n(n) = win + 1) — w(n) (6.108) 


subject to the constraint 


wT (n + 1)x(n) = d(n) (6.109) 


Observe that the solution given by Eq. (6.107) satisfies the constraint (6.109). Hence, 
define yy ms (1) as 


MnLMs (2) = W(n + 1) — w(n) = e(n)x(n) (6.110) 


1 
x!(n)x(n) 


We will now show that yy;ys(") is indeed the solution to the problem posed 
above. 
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Let the optimum y(n) be given by 


Non) = Nn_ms(2) + 4 (n) (6.111) 


where ņ;(n) indicates any difference that may exist between n, (n) and yy s (7). As the 
updated vector w(n + 1) = w(n) + ny ms(7) satisfies the constraint (6.109), we get 


(w(n) + Nims (2)) Tx (n) =d(n) (6.112) 


The tap-weight vector w(n + 1) = w(n) + 7,(n) also satisfies the constraint (6.109), since 
no(n) is the optimum solution. Thus, 


(w(n) + 9o(n))'x(n) = d(n) (6.113) 
Subtracting Eqs. (6.112) from (6.113) and using Eq. (6.111), we get 
ni (n)x(n) = 0 (6.114) 


Multiplying the left- and right-hand sides of Eq. (6.111) by their respective transposes, 
from left, we obtain 


nt m(n) = Myrms (2) + m (2))" Cans (2) + 14 0) 


= Mxims()Mnems(”) + 9] N (n) 


+29Ms (n)n, (n) (6.115) 
Premultiplying Eq. (6.110) by nT (n) and using Eq. (6.114), we obtain 
nT yms”) = 0 (6.116) 


Substituting Eq. (6.116) in Eq. (6.115), we obtain 


MEN) = Mirns Mym) + nTn (n) (6.117) 


This suggests that the squared Euclidean norm of the vector ņ,(n), that is, nTn, (n), 
attains its minimum when the squared Euclidean norm of the vector ņ;(n) is minimum. 
This, of course, is achieved when 7, (1) = 0. Thus, we obtain 


no(n) = Anis (1) (6.118) 


This completes our proof.” 

Figure 6.15 gives a geometrical interpretation of the above result. The tap-weight vector 
w(n) is represented by a point. The constraint w! (n+ 1)x(n) = d(n) limits w(n + 1) to 
the points in a subspace whose dimension is one less than the filter length, N, that is, 
N — 1. This is represented as a plane in Figure 6.15. The vector 97 ys(7) is orthogonal 
to this subspace. It is also the vector connecting the point associated with w(n) to its pro- 
jection on the subspace. This, clearly, shows that nyy ms (7) is the minimum length vector, 


5 The above results could also be derived by the application of the method of Lagrange multipliers; see 
Section 6.10.1 for an example of the use of Lagrange multipliers. Here, we have selected to give a direct derivation 
of the results from the first principles of vector calculus. This derivation is also instructive because its application 
leads to the geometrical interpretation of the NLMS recursion depicted in Figure 6.15. 
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Figure 6.15 Geometrical interpretation of the NLMS recursion. 


which results in the updated tap-weight vector w(n + 1) = w(n) + nytļms(”) subject to 
the constraint (6.109). 

Despite its appealing interpretations, the NLMS recursion Eq. (6.107) is seldom used 
in actual applications. Instead, it is often observed that the following relaxed recursion 
results in a more reliable implementation of adaptive filters: 


-H 
xT(n)x(n) + y 


In this recursion, jz and y are positive constants, which should be selected appropriately. 
The rationale for the introduction of the constant w is to prevent division by a small 
value when the squared Euclidean norm x! (n)x(n) is small. This results in a more stable 
implementation of the NLMS algorithm. The constant 2 may be thought of as a step- 
size parameter, which controls the rate of convergence of the algorithm and also its 
misadjustment. We also note that the recursion (6.119) reduces to Eq. (6.107) when 
jt = 1 and y = 0. Table 6.2 gives a summary of the NLMS algorithm. 


w(n + 1) = w(n) + e(n)x(n) (6.119) 


6.7 Affine Projection LMS Algorithm 


Affine projection LMS (APLMS) algorithm, also called generalized NLMS algorithm, 
is in fact a generalization of the NLMS algorithm. Following the Goodwin and Sin’s 
(1984) formulation of the NLMS, APLMS algorithm is obtained by solving the following 
constrained optimization problem: 


Given the set of tap-input vectors x(n), x(n — 1), ..., x(n — M + 1) and the set of desired 
output samples d(n), d(n — 1),...,d(n — M + 1), choose the updated tap-weight vector 
w(n + 1) so as to minimize the squared Euclidean norm of the difference 


n(n) = w(n + 1) — w(n) (6.120) 
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Table 6.2 Summary of the normalized LMS algorithm. 


Input: Tap-weight vector, w(), 
input vector, x(n), 
and desired output, d(n) 
Output: Filter output, y(n), 
tap-weight vector update, w(n + 1) 


1. Filtering: 
y(n) = w' (n)x(n) 


2. Error estimation: 
e(n) = d(n) — y(n) 


3. Tap-weight vector adaptation: 


e(n)x(n) 


| ł = 
w(n + 1) = w(n) 4 x! (n)x(n) + Y 


subject to the set of constraints 
wi(n+ 1)x(n—k) = d(n — k), k=0,1,...,M -1 (6.121) 


This problem can be best solved using the method of Lagrange multipliers. To this end, 
we define the N x M matrix 


X(n) = [x(n)x(n — 1)...x(n— M + 1)] (6.122) 
and the length M column vector 
d(n) = [d (n)d(n — 1)---d (n — M + 1)]" (6.123) 
and note that the set of constraints (6.121) can be written as 
XT (n)w(n + 1) = d(n) (6.124) 
Then, following the method of Lagrange multipliers, we define 
E° = |lw = WOI? + XT w — din) "A (6.125) 


where A is a column vector of the Lagrange multipliers Ap, àj, ..., Ay_1- 
The solution to the above-constrained optimization problem is obtained by forming and 
solving the system of equations 
Vyé =0 (6.126) 


and 
Vas =0 (6.127) 


We note that Eq. (6.127) reduces to the constraints (6.124), and Eq. (6.126) leads to the 
minimization of the norm ||w(n + 1) — w(n)||?. 
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Noting that ||w — w(n)||? = (w — w(n))'(w — w(n)) and substituting Eq. (6.125) in 
Eqs. (6.126) and (6.127), we obtain 


2(w(n + 1) — w(n)) + X(n)d = O (6.128) 


and 
XT (nwn + 1) — d(n) = 0 (6.129) 


respectively. Note that here w is replaced by w(n + 1) because the solution to Eqs. (6.128) 
and (6.129) is indeed w(n + 1). 

Multiplying Eq. (6.128) from left by XT(n), noting that according to Eq. (6.129), 
X'(n)w(n + 1) = d(n), and defining 


e(n) = d(n) — X'(n)w(n) (6.130) 


we obtain 
X = —2(X'(n)X(n)) e(n) (6.131) 


Substituting Eq. (6.131) in Eq. (6.128) and rearranging the result, we obtain 
w(n + 1) = w(n) + XM AXTA (n) eln) (6.132) 


This is the APLMS update equation. 
Because of the same reasons as in the case of NLMS algorithm, in practice, Eq. (6.132) 
is often replaced by its relaxed form 


w(n + 1) = win) + XM XTX (n) + WI)! e(n) (6.133) 


where ù is an step-size parameter, I is the M x M identity matrix, and y is a small 
positive constant that ensures the numerical stability of the algorithm when X"(n)X(n) 
is a near singular matrix. Table 6.3 presents a summary of the APLMS algorithm. 


Table 6.3 Summary of the affine projection LMS algorithm. 


Input: Tap-weight vector, w(7), 

Input matrix, X(n) = [x(n)x(n — 1) ---x(n-M+1)], 

and Desired output vector, d(n) = [d(n) d(n— 1)---d(n— M + yy" 
Output: Filter output, y(n), 

Tap-weight vector update, w(n + 1) 


1. Filtering: 
y(n) = X"(n)w(n), y(n) = the first element of y(n) 


2. Error estimation: 
e(n) = d(n) — y(n) 


3. Tap-weight vector adaptation: 


w(n + 1) = wn) + XM) XTX (n) + YD ten) 
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For M = 1, the APLMS algorithm reduces to the NLMS algorithm. However, the 
APLMS algorithm offers a significant convergence improvement over the NLMS 
algorithm as M increases. Clearly, this improvement is at a cost of additional 
computational complexity. A comparison of the number of operations in Tables 6.2 
and 6.3 reveal that the APLMS algorithm is at least M times more complex than 
the NLMS algorithm. This does not include the computation and inversion of the 
M x M matrix X"(n)X(n) + yI. Nevertheless, further studies reveal that the tap-weight 
vector adaptation (6.133) may be executed once after every M sample, without any 
significant loss in the convergence behavior, hence, brings down its complexity to a 
level comparable to that of the NLMS algorithm. A summary of various versions of the 
APLMS algorithm can be found in Morgan and Kratzer (1996). 

A convergence analysis of the APLMS algorithm is presented in Shin and Sayed (2004). 
The following observations are made from the results presented in this work. (i) The 
convergence rate of APLMS algorithm improves as M increases. (ii) The misadjustment 
of APLMS algorithm, on the other hand, increases (i.e., degrades) as M increases. Here, 
without getting involved into the details of mathematical derivations, we make an attempt 
to explain these observations through some intuitions. 

Using the matrix inversion lemma formula (6.60), one can show that 


X(n)(X"(n)X(n) + WD! = (KX) XT (n) + YD X(n) (6.134) 
Substituting Eq. (6.134) in Eq. (6.133), we obtain 
w(n + 1) = wn) + XXT n) + YD Xmen) (6.135) 


Examining Eq. (6.135), one finds that X(n)e(n) = ya! x(n — ien —i) and this 
in turn implies that X(n)e(n) is a random vector whose mean is equal to -4V,£ ; 
Also, as M increases, the mean of the N x N matrix X(n)XT(n) approaches MR. 
Hence, one may think of (X(n)XT(n) + YI) !X(n)e(n) as a noisy (and regularized) 
sample of -iR ! V£ , and, thus, argue that the update equation of APLMS algorithm 
attempts to implement a stochastic version of the Newton’s method. In other words, 
one may argue that the premultiplication of the stochastic gradient vector X(n)e(n) by 
(X(n)X'(n) + WI)! results in a vector that, on average, points toward the minimum 
of the performance surface of the adaptive filter, hence, avoiding the slow modes of 
convergence of the LMS algorithm. 

To explain why the misadjustment of APLMS algorithm increases with M, we resort 
to a generalization of the geometrical interpretation of the NLMS algorithm that was 
presented earlier in Figure 6.15. Figure 6.16 presents a diagram that expands Figure 6.15 
to the case where M = 2. Here, to satisfy the pair of constraints w'x(n) = d(n) and 
w'x(n — 1) = d(n — 1), while minimizing ||74p ys (7)||, the error vector n appms (7) must 
be orthogonal to the intersection of the subspaces of the two constraints. Hence, one 
may note that 7,prys(7) is not necessarily orthogonal to the subspace of the constraint 
w'x(n) = d(n) and thus ||qap_ms(7)I| = lMyuems ll. Obviously, ||74pLms(7)|| increases 
further as M is given larger values. On the other hand, we recall that my; s(n) and, 
similarly, 7 4py_ys(7) may be thought as a perturbation that may be imposed on the filter 
tap weights as the NLMS and APLMS algorithms proceed. In the steady state, a larger 
perturbation of the tap weights, clearly, results in a larger misadjustment. A few problems 
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Figure 6.16 Geometrical interpretation of the affine projection LMS recursion. 


at the end of this chapter guide the reader to develop a better understanding of the NLMS 
and APLMS algorithms through a sequence of computer simulations. 


6.8 Variable Step-Size LMS Algorithm 


The analysis presented in Section 6.3 shows that the step-size parameter, u, plays a 
significant role in controlling the performance of the LMS algorithm. On the one hand, 
the speed of convergence of the LMS algorithm changes proportional to its step-size 
parameter. As a result, a large step-size parameter may be required to minimize the 
transient time of the LMS algorithm. On the other hand, to achieve a small misadjustment, 
a small step-size parameter has to be used. These are conflicting requirements and, thus, a 
compromise solution has to be adopted. The variable step-size LMS (VSLMS) algorithm, 
which is introduced in this section, is an effective solution to this problem (Shin and Lee, 
1985; Harris, Chabries, and Bishop, 1986). 

The VSLMS algorithm works based on a simple heuristic that comes from the mecha- 
nism of the LMS algorithm. Each tap of the adaptive filter is given a separate time-varying 
step-size parameter and the LMS recursion is written as 


w,(n + 1) = w;(n) + 2; (n)e(n)x(n — i), for i=0,1,...,N-—1 (6.136) 


where w;(n) is the ith element of the tap-weight vector w(n) and u;(n) is its associated 
step-size parameter at iteration n. The adjustment of the step-size parameter ju; (n) is done 
as follows. The corresponding stochastic gradient term g;(n) = e(n)x(n — i) is monitored 
over the successive iterations of the algorithm, and u;(n) is increased if the latter term 
consistently shows positive or negative direction. This happens when the adaptive filter 
has not yet converged. As the adaptive filter tap weights converge to some vicinity of 
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their optimum values, the averages of the stochastic gradient terms approach zero and 
hence they change signs more frequently. This is detected by the algorithm and the 
corresponding step-size parameters are gradually reduced to some minimum values. If the 
situation changes and the algorithm begins to hunt for a new optimum point, the gradient 
terms will indicate consistent (positive or negative) directions, resulting in increase of 
the corresponding step-size parameters. To ensure that the step-size parameters do not 
become too large (which may result in the system instability) or too small (which may 
result in a slow reaction of the system to sudden changes), upper and lower limits should 
be specified for each step-size parameter. 

Following the above argument, the VSLMS algorithm step-size parameters, j1;(1)’s, 
may be adjusted using the recursions 


y(n) = 1; (n — 1) + psignlg;(n)|signlg;(n — 1)] (6.137) 


where p is a small positive step-size parameter. The “sign” functions may be dropped 
from Eq. (6.137). This results in the following alternative step-size parameter update 
equation: 

jn) = u;(n — 1) + pg; (n)g;(n — 1) (6.138) 


Both update equations (6.137) and (6.138) work well in practice. Which of the two 
choices works better is application dependent. The choice of one over the other may also 
be decided based on the available hardware/software platform on which the algorithm is 
to be implemented. For instance, if a digital signal processor is being used, the recursion 
(6.138) may be much easier to implement. On the other hand, if a custom chip is to be 
designed, the update equation (6.137) may be preferred. 

Derivation of an inequality similar to Eq. (6.74) to determine the range of the step-size 
parameters that ensure the stability of the VSLMS algorithm is rather difficult because 
of the time variation of the step-size parameters. Here, we adopt a simple approach by 
assuming that the step-size parameters vary slowly, so that for the stability analysis, 
they may be assumed fixed and use the analogy between the resulting VSLMS algo- 
rithm equations and the conventional LMS algorithm to arrive at a result, which through 
computer simulations has been found to be reasonable. Further results on the VSLMS 
algorithm misadjustment and its tracking behavior, along with computer simulation results, 
can be found in Chapter 14. 

The set of update equations (6.136) may be written in vector form as 


w(n + 1) = w(n) + 2m (ny)e(n)x(n) (6.139) 


where u(n) is a diagonal matrix consisting of the step-size parameters jig(n), y(n), ..., 
[ky—,(”). Equation (6.139) may further be rearranged as 


vn+1)=d- 2p(n)x(n)x! (n))v(n) + 2pu(n)e,(n)x(n) (6.140) 


where notations follow those of Section 6.2. Comparing Eq. (6.140) with Eq. (6.13) 
and the subsequent discussions on the stability of the conventional LMS algorithm in 
Section 6.3, we may argue that to ensure the stability of the VSLMS algorithm, the scalar 
step-size parameter u in Eq. (6.74) should be replaced by the diagonal matrix p(n). 
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Table 6.4 Summary of an implementation of variable step-size LMS 
algorithm. 


Input: Tap-weight vector, w(n), input vector, x(n), 
Gradient terms gy(n — 1), g,(n— 1), ..., By_ya— 1), 
Step-size parameters, yo(n — 1), u(n — 1), ..., uy- — 1), 
and desired output, d(n) 

Output: Filter output, y(n), tap-weight vector update, w(n + 1), 


gradient terms go(7), g,(”), ..., 8&y-1 0), 
and updated step-size parameters uo(n), u(n), ..., Uy _ (2) 
1. Filtering: 


y(n) = w' (n)x(n) 


2. Error estimation: e(n) = d(n) — y(n) 
3. Tap weights and step-size parameters adaptation: 
For i=0,1,...,N—1 
8 (1) = e(n)x(n — i) 
u(n) = u;(n — 1) + psignig; (7) Isign[g; (n — 1)] 
if u(n) > Max» Hi) = Umax 
if u(n) < Mins Hi) = Hmin 
w)(n + 1) = w;(n) + 2u; (n)g; (n) 
end 


This leads to the inequality® 
trlaR] < 5 (6.141) 


as a sufficient condition, which ensures the stability of the VSLMS algorithm. Although 
the inequality (6.141) may be used to impose some bounds on the step-size parameters 
j4;(n)’s dynamically as the adaptation of the filter proceeds, this leads to a rather com- 
plicated process. Instead, in practice, one usually prefers to use Eq. (6.74) to limit all 
j4,;(n)’s to the same maximum value, say Mmax- 

The minimum bound, which may be imposed on the variable step-size parameters, 
4; (n)’s, can be as low as zero. However, in actual practice, a positive bound is usually 
used, so that the adaptation process will be on all the time and possible variations in the 
adaptive filter optimum tap weights can always be tracked. Here, we use the notation 
Ulmin to refer to this lower bound. Table 6.4 gives the summary of an implementation of 
the VSLMS algorithm. 


6.9 LMS Algorithm for Complex-Valued Signals 


In applications such as data transmission with quadrature-amplitude modulation (QAM) 
signaling and beamforming with baseband processing of signals, the underlying data 


6 See Chapter 14 for a formal derivation of Eq. (6.141). 
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signals and filter coefficients are complex-valued. To modify the LMS recursion for such 
applications, we use the definition of gradient of real-valued functions of complex-valued 
variables, as was defined in Section 3.5. We consider an adaptive filter with complex- 
valued tap-input vector x(n), tap-weight vector w(n) = [wo (n)w7(n) --- whl, out- 
put y(n) = w! (n)x(n), and desired output d(n). 

The LMS algorithm in this case works based on the update equation 


w(n + 1) = win) — u VS |e(n) |? (6.142) 


where VE denotes complex gradient operator with respect to the variable vector w. This 
is defined as 


vE = i (6.143) 


where VE, as was defined in Section 3.5, is complex gradient with respect to the complex 
variable w. We recall that 9 9 

Vo a4 jo 6.144 

v = Jwr J Fuh ( ) 

where wg and w; are real and imaginary parts of w, respectively, and j = y —1. We also 

note that in Eq. (6.143), the elements of the gradient vector V$ are complex gradient 

with respect to the elements of w and these elements are the conjugates of the actual tap 

weights, that is, w5, wÏ, -++ , Wy_,- Furthermore, we note that a direct substitution in Eq. 

(6.144) gives 
ð . 0 
Ve =a Oe 
l ð Wj, R ð Wj, I 


(6.145) 


Replacing |e(n)|? by e(n)e*(n), using Eq. (6.145), and following a derivation similar 
to the one which has led to Eq. (3.63), we obtain 


Viele)? = —2e*(n)x(n —i), for i=0,1,...,N-—1 (6.146) 


where the asterisk denotes complex conjugation. Substituting Eq. (6.146) and definition 
(6.143) in Eq. (6.142), we obtain 


w(n + 1) = w(n) + 2ue* (n)x(n) (6.147) 


This is the desired LMS recursion for the case where the underlying processes are 
complex-valued. Table 6.5 gives a summary of implementation of the LMS algorithm 
for complex-valued signals. 

The convergence properties of the LMS algorithm for complex-valued signals are very 
similar to those of the real-valued signals. These properties are summarized as follows 
for reference: 


e The time constant equations (6.33) is also applicable to adaptive filters with complex- 
valued signals. 
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Table 6.5 Summary of the complex LMS algorithm. 


Input: Tap-weight vector, w(), 
input vector, x(n), 
and desired output, d(n) 
Output: Filter output, y(n), 
Tap-weight vector update, w(n + 1) 


1. Filtering: 
y(n) = w" (n)x(n) 


2. Error estimation: 
e(n) = d(n) — y(n) 
3. Tap-weight vector adaptation: 


w(n + 1) = w(n) + 2ue* (n)x(n) 


The misadjustment equation (6.61) has to be slightly modified. This modification is 
the result of the fact that for complex-valued jointly Gaussian random variables, the 
equality (6A.6) has to be replaced by 


Elxyx5x3x4] = E[x, x5] Elx3x7] + Elx xg] EL x3] (6.148) 
Taking note of this and following a similar derivation as in Section 6.3, we obtain’ 
N-1 
Di MA: /(L = Aj) 
M= = (6.149) 
1— >> pA, /( — mà;) 
i=0 


When the step-size parameter, u, is small, so that uà; < 1, fori = 0, 1,..., N — 1, 
Eq. (6.149) reduces to Eq. (6.64). Thus, the approximation (6.64) is also applicable to 
the case where the underlying signals are complex-valued. 

Using Eq. (6.149) and following the same arguments as given in Section 6.3.4, we find 
that in the case of complex-valued signals, the LMS algorithm remains stable when 


<ü (6.150) 


nc 

2tr[R] 
Comparing this result with Eq. (6.74), we find that in the case of complex-valued 
signals, the upper bound of u is more relaxed when compared with the corresponding 
bound for real-valued signals. 


7 A detailed derivation of this result can be found in Haykin (1991). 
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6.10 Beamforming (Revisited) 


The beamforming structure presented in Example 3.6 as well as earlier in this chapter 
works with signals at their associated radio frequency (RF) or an intermediate frequency 
(IF). We also recall that the carrier phase plays a major role in implementing a desired 
beam pattern. To extract and use the carrier phase angles of the signals picked up 
by the array elements, it was previously proposed that modulated carrier signals and 
their associated 90° phase-shifted version be processed, simultaneously. The amplitude 
and phase angle of an amplitude-modulated signal, such as u(t) = a(t) cos(@,t + ġ) 
(where ¢ denotes continuous time), are preserved if it is converted to an equivalent 
complex-valued baseband signal using a phase-quadrature demodulator structure, as 
depicted in Figure 6.17. This structure suggests that the baseband equivalent of an RF 
(or IF) signal u(t) = a(t) cos(w,f + @), which preserves both phase and amplitude of 
u(t), is the sampled signal 


u(n) = a(n)el? 


This is known as phasor. Note that we have used underline notation to indicate that u(n) 
is a phasor. 

In the implementation of beamformers, working with phasor signals is more convenient 
than RF (or IF) signals. In particular, from implementation point of view, digital process- 
ing of RF (or IF) signals requires a very high sampling rate to prevent aliasing and allow 
any postprocessing of the sampled signals, while the required sampling (Nyquist) rate for 
equivalent baseband signals is much lower. 


a(n) cos o 
vee T (phase) 
2 cos(wet) 
u(n) = a(n)e? 
LPF a(n) sind 


pr (quadrature) 


u(t) = a(t) cos(w.t + $) 


Figure 6.17 Conversion of an amplitude-modulated signal to its equivalent phase-quadrature base- 
band (phasor) signal. 
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Example 6.3 


In this example, we discuss the implementation of the beamformer of Figure 6.12 at base- 
band using phasor signals. Figure 6.18 shows an equivalent implementation of Figure 6.12 
when all signals are converted to their equivalent phasors. This implementation, as shown, 
involves an adaptive filter with only one complex tap weight, w, whose optimum value 
is obtained by minimizing 

£ = E[|d(n) — wx(n)|"] (6.151) 


Using the definition (6.144) to obtain the gradient of £ with respect to w and setting the 
result equal to zero, we obtain 


E * 
= a (6.152) 
El|x(n)|*] 
where w, is the optimum value of w. 
Converting d(n) of Eq. (6.90) to its equivalent phasor, we get 

d(n) = a (njet?! + B(n)e/ $2720) (6.153) 

Similarly, 
x(n) = a(n)e/®! + B(n)el®” (6.154) 


Substituting Eqs. (6.153) and (6.154) in Eq. (6.152) and recalling that a(n) and B(n) are 
zero-mean, real-valued and uncorrelated random variables, we obtain 
o2 + o2ze—i¢o 
ms 2 (6.155) 
og + o5 
With this value of w,, the array power gain, G (0), for a narrow-band signal arriving at 
an angle 0, is obtained as follows. 
Assuming that the signal arriving at the angle of 6 is y(t) cos @,¢ and using Eq. (6.93), 
with 6, replaced by 0, we get d(n) = y (n)e `?" sin? We also note that x(n) = y(n). Thus, 


e It sind —jmsind _ 


Wo) 


— Woy (n) = y(n) (e 


e(n) = y(n) 


Figure 6.18 Baseband implementation of a two-element beamformer. 


184 Adaptive Filters 


and 
Efle(n)|7] 


E[\x()|?] 


Gib) = = eie — wel? (6.156) 
Careful examination of Eqs. (6.95) and (6.156) reveals that, as one may expect, both 
implementations of the beamformer (i.e., Figures 6.12 and 6.18) result in the same 
optimized power gain. The beamformer tap weights wg and w; in Eq. (6.95) correspond 
to the real and negative of the imaginary parts, respectively, of the complex tap weight 
w, in Eq. (6.156). This, in turn, confirms that the two implementations are equivalent. 

To adjust w adaptively, we may use the complex LMS algorithm of Table 6.5 with the 
following substitutions: 


x(n) = x(n), d(n)=d(n), e(n) = e(n) 


and 
win) = w* (n) 


If we run the resulting algorithm for sufficient number of iterations and then use the con- 
verged tap weight in Eq. (6.156), we will obtain the same directivity pattern as the one 
presented in Figure 6.13 as the two implementations are equivalent. 


So far, we have introduced beamformers that are limited to only two antennas. Such 
beamformers are capable of canceling only one jammer. Use of more elements, as 
shown in Figure 6.19, allows cancellation of more than one jammer. In general, to 
cancel M jammers, one requires at least M + 1 antennas. We may also recall that the 
implementation proposed in Example 6.3 and also those that were discussed previously 
do not differentiate between the jammer(s) and the desired signal. They simply adapt, 
so that the stronger signal(s) is (are) canceled, leaving behind the weaker signal(s). In 
cases where no jammer is present or the desired signal is strong, the latter is deleted 
by the beamformer. This problem can be prevented using an amended version of the 
LMS algorithm, which imposes a linear constraint on the tap weights of the adaptive 
filter. This, which is known as linearly constrained LMS algorithm, is introduced in the 
next section. To be able to apply the latter algorithm, the beamformer structure has to 
be modified as shown in Figure 6.20, where M + 1 antennas are used for cancellation 
of up to M jammers arriving from different directions. 

The fundamental difference between the two structures shown in Figures 6.19 and 
6.20 is that in the latter, there is no primary input. The tap weights of the beamformer 
of Figure 6.20 are optimized, so that its output, y(n), is minimized in the mean-square 
sense. To prevent the trivial solution of w; = 0, for all i, a linear constraint, which 
ensures a nonzero gain in the desired direction, is imposed on the beamformer tap weights 
before their optimization. The discussions provided in the next section and, especially, 
Example 6.4 will clarify this concept. 

We may also recall that the beamformer structures depicted in Figures 6.19 and 6.20 
assume that the signals picked up by array elements (antennas) are narrow-band. When the 
underlying signals are wideband, the output of each element has to go through a transversal 
filter, so that there would be some control over different frequency bins (Widrow and 
Stearns, 1985; Johnson and Dudgeon, 1993). 
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Ly—1(N) 


WM-1 


Figure 6.19 Baseband implementation of an (M + 1)-element beamformer. 


Lo(n) 
e Wo 
i e4 y(n) 
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Figure 6.20 Alternative implementation of the (M + 1)-element baseband beamformer. 
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6.11 Linearly Constrained LMS Algorithm 


In this section, we discuss the problem of Wiener filtering with a linear constraint imposed 
on the filter tap weights. We also present an LMS algorithm for adaptive adjustment 
of the filter tap weights subject to the required constraint. For the sake of simplicity, 
all derivations are given for the case of real-valued signals. However, we also give a 
summary of the final results for the case of complex-valued signals. Application of the 
proposed algorithm to narrow-band beamforming is then discussed as an example. 


6.11.1 Statement of the Problem and Its Optimal Solution 


Given an observation vector x(n) and a desired response d (n), we wish to find a tap-weight 
vector w, so that 
e(n) = d(n) — w'x(n) (6.157) 


is minimized in the mean-square sense, subject to the constraint 
T= 
cw=a (6.158) 


where a is a scalar and c is a fixed column vector. 
This problem can be solved using the method of Lagrange multipliers. According to 
the method of Lagrange multipliers, we define (the superscript c stands for constraint) 


E = E[e*(n)] + A(c'w — a) (6.159) 


where A is the Lagrange multiplier, and solve the equations 


0 (6.160) 


simultaneously. We note that 0&°/dA = 0 results in the constraint (6.158). 
Substituting Eq. (6.157) in Eq. (6.159) and going through some manipulations similar 
to those in Chapter 4 (Section 4.3), we obtain 


E = Enin +v'Rv+ACv—a’) (6.161) 


where v =W-—W, W, = R!p, R = E[x(n)x'(n)], p = Eld(n)x(n)], and a’ = a — 
c'w,. With this, the above problem is reduced to the minimization of v'Rv, subject 
to the constraint cTv = a’. The solution to this problem is obtained by simultaneous 
solution of 


V, = 2Rvs + Ac = 0 (6.162) 

and JE: 
=c —a' =0 (6.163) 

3A 


where vf is the constrained optimum value of v. 
From Eq. (6.162), we obtain 


c À —1 
Vo = -3R c (6.164) 
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Substituting Eq. (6.164) in Eq. (6.163), we get 
Xr 
-350R 'e —da' =0 


or 


2a' 
à = -= 6.165 
cTR-'c € ) 
Finally, substituting Eq. (6.165) in Eq. (6.164), we obtain 
a'R`!e 
Y= TR lc (6.166) 


The minimum value of ° is obtained by substituting Eq. (6.166) in Eq. (6.161). This 


gives 
12 


a 
RM = Sinin + cR-le (6.167) 
We note that the second term on the right-hand side of Eq. (6.167) is the excess MSE, 
which is introduced as a result of the imposed constraint. 
Also, noting that w = v + w,, and using Eq. (6.166), we obtain 


aR 'e 


2an 6.168 
cTR-!c ( ) 


wo = W, + 
6.11.2 Update Equations 


The adaptation of the tap-weight vector w, while the constraint (6.158) holds, may be 
done in two steps as follows: 


Step 1. 
wt (n) = w(n) + 2ue(n)x(n) (6.169) 


Step 2. 
w(n +1) = wi (n) + b(n) (6.170) 


where #(n) is chosen, so that c'w(n + 1) = a, while 8'(n)#(n) is minimized. 
That is, we choose Ŷ (n), so that the constraint (6.137) holds after Step 2, while 
the perturbation introduced by #(n) is minimized. 


The latter problem can also be solved using the method of Lagrange multipliers and 
following a procedure similar to the one used above to obtain v$. This gives 


a — cTwt (n) 
oG 


b(n) = T (6.171) 
crc 
Substituting this result in Eq. (6.170), we obtain 
elgg 
w(n + 1) = wt (n) + ee (6.172) 


The above derivations are summarized in Table 6.6. 
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Table 6.6 Summary of the linearly constrained LMS 
algorithm. 


Input: Tap-weight vector, w(n), 
input vector, x(n), 
and desired output, d(n) 
Output: Filter output, y(n), 
Tap-weight vector update, w(n + 1) 


1. Filtering: 
y(n) = w' (n)x(n) 


2. Error estimation: 
e(n) = d(n) — y(n) 
3. Tap-weight vector adaptation: 


w' (n) = w(n) + we(n)x(n) 


Tet 
wat 1) = wh(n) + we 


6.11.3 Extension to the Complex-Valued Case 


When the underlying signal/variables are complex-valued, the following amendments have 
to be made to the previous results: 


e The constraint equation (6.158) is written as 
wec =a (6.173) 


where the vector c and the scalar a are both complex-valued. 
e The constrained optimum tap-weight vector of the filter is obtained according to the 
equation 
a*Ro'e 


eet 6.174 
cHR-le ( ) 


Wo = Wot 


where a’ = a — wile, 


e The adaptation of the filter tap weights is made according to the following equations: 
e(n) = d(n) — w4(n)x(n) (6.175) 
wt (n) = w(n) + 2ue* (n)x(n) (6.176) 
and * Hwt 
wiat D= wt OEO, (6.177) 
chc 
Example 6.4 


As an example of the linearly constrained LMS algorithm, we consider the two-element 
narrow-band beamformer of Figure 6.21. Here, we consider the processing of signals in 
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Figure 6.21 Baseband implementation of a two-element beamformer with a linearly constrained 
beam pattern. 


baseband, that is, phasor (complex-valued) signals are considered. We note that with s(n) 
and v(n), as defined in Section 6.4.4, 


xon) = a (njet! + (njet 2%) (6.178) 
xn) = a(nje!*! + B(nje!” (6.179) 


The beamformer tap weights, wọ and w4, are adjusted, so that its output, y(n), is mini- 
mized in the mean-square sense. This is equivalent of saying d(n) = 0. It is clear that if 
there is no constraint on the tap weights and they are adjusted to minimize E[|y(n)|*], 
we obtain the undesirable result of wo, = W1, = 0, which cancels both the jammer and 
the desired signal. To ensure that the desired signal s(n), arriving at the direction perpen- 
dicular to the line connecting A to B, passes through the beamformer with no distortion, 
the following constraint must hold: 


Wo + Wy = 1 
Using vector notations, this may be written as 
wie=1 (6.180) 


where c = [1 1]? and w = [w3 wl We note that, in general, the value of c will depend 
on the angle of arrival of the desired signal s(n) with respect to the perpendicular to the 
line connecting A to B; see Problem P6.29. 

Letting x(n) = [xp(n) x, (n)]" and noting that a(n) and (n) are uncorrelated with 
each other, we get 


2 2 2 2e—ibo 
R= E + Op Oy + Ope J | (6.181) 


Ge + opelte ae + of 
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Using this result and noting that in the present case c = [1 1]', a = 1, and w, = 0, we 


obtain, from Eq. (6.174), 

1 1] = —jo 

—— — 5 (6.182) 
2(1 — cos ¢,) | 1 — e7% 


Using this, we get l 
Yo(n) = (wW) x(n) = a(n)e/*! 


which means that the desired signal, s(n), passes through the beamformer with no distor- 
tion, while the jammer, v(n), is completely canceled. 


Problems 


P6.1 Show that when an adaptive filter has converged and w(n) ~ w, 
Variance of Ve? (n) X 4Emnin E [x?(n)]. 


P6.2 Formulate the LMS algorithm for a one-step ahead N-tap linear predictor, that 
is, a filter that predicts x(n) based on a linear combination of its past samples, 
x(n —1),x(n —2),...,x(n — N). 


P6.3 By multiplying A + «aa! with the right-hand side of Eq. (6.59), confirm the 
equality (6.59). 


P6.4 Prove that if a and b are two positive values and a+b < 1, 


a b a+b 
< 
l-a 1-b 1—(a+b) 


Use this result to establish the inequality (6.70). 


P6.5 A 10-tap transversal adaptive filter is adapted using LMS algorithm. Consider five 
cases of the filter input, which are characterized by the following eigenvalues: 


Case 1 2 3 4 5 

Xo 1.0000 1.8182 5.2632 1.8182 1.0989 
Ay 1.0000 1.6364 0.5263 1.8182 1.0989 
Ay 1.0000 1.4545 0.5263 1.8182 1.0989 
Ax 1.0000 1.2727 0.5263 1.8182 1.0989 
À4 1.0000 1.0909 0.5263 1.8182 1.0989 
Às 1.0000 0.9091 0.5263 0.1818 1.0989 
À6 1.0000 0.7273 0.5263 0.1818 1.0989 
Aq 1.0000 0.5455 0.5263 0.1818 1.0989 
Àg 1.0000 0.3636 0.5263 0.1818 1.0989 
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P6.6 


P6.7 


P6.8 


Note that for all cases, the eigenvalues are normalized such that }°;A; = tr[R] = 
10. This implies that the tight stability bound Eq. (6.74) for all cases is the 
same. The goal of this problem is to show the impact of distribution of eigenvalues 
of the correlation matrix of an adaptive on the variation of the true stability 
bound of the LMS algorithm, that is, the range of u that guarantees a stable LMS 
algorithm. 


(i) Make a plot of J (as defined in Eq. (6.65)) for each case when u varies 
from 0 to 0.1. 
(ii) Find the range of u in each case, which results in a stable LMS algorithm. 
(111) Discuss the various ranges that you have obtained in (ii) and compare them 
with the tight-bound 1|/(3tr[R]) and a softer bound that may be defined as 
1/tr[R]. Discuss in which cases one is closer to the former bound or to the 
latter bound. 


Equations (6.84) and (6.87) provide approximate expressions for expected learn- 
ing curves of the LMS algorithm in the two cases of system modeling and channel 
equalization. For the five cases noted in Problem P6.5, plot the expected learning 
curves of the LMS algorithm for system modeling and channel equalization and 
discuss your observation. 


The input process to a system modeling problem, using a 10-tap FIR adaptive 
filter, has the power spectral density shown in Figure P6.7. Assume that the MSE 
at zeroth iteration is equal to 1 and &,,;, = 0.0001, and the step-size parameter u 
has been chosen for a 10% misadjustment. Present a typical learning curve of an 
LMS algorithm in this setup. Indicate the time constants of the various modes of 
convergence on the presented curve and the mean squared error (MSE) that the 
LMS algorithm converges to. 


T 2T w 


Figure P6.7 


Consider a channel equalization problem similar to the one depicted in Figure 6.8. 
The magnitude response of the channel, |H(z)|, is as shown in Figure P6.8. 
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P6.9 


P6.10 


Figure P6.8 


The additive noise at the channel output has a variance of ø? = 0.04. The trans- 
mitted data symbols, s(n)’s, take the values of +1 and —1 and are samples of a 
white noise process. 


(i) Draw the power spectral density of the sequence x(n) and obtain an estimate 
of E[|x(@)|’]. 

(ii) Give estimates of the maximum and minimum eigenvalues of the correlation 
matrix of the input process to the equalizer. 

(iii) When the conventional LMS algorithm is used to adjust the equalizer tap 
weights and the equalizer has 20 taps, what is the value of the step-size 
parameter u which results in 10% misadjustment? 

(iv) Obtain the range of time constants of the LMS algorithm in the present case 
and plot a typical learning curve for that. 


It has been noted that when the tap weights of a line enhancer are initialized to 
zero, it converges very fast. However, this will not be the case when tap weights 
are randomized to some nonzero values. Explain why this is true. 


The sequence u(n) = cos(nw, + ¢(n)) is a narrow-band phase-modulated sam- 
pled signal. The phase angle ġ(n) is random, but varies slowly in time, so that 
o(n) © d(n— 1) © b(n — 2). The aim is to detect the carrier frequency w, of 
u(n). It is proposed that the setup shown in Figure P6.10 be used. The coefficient 
w has to be adjusted so as to maximize the output, y(n), in the mean-square sense. 


(i) Show that the optimized value of w is 
Ww, © 2cos w, 


(ii) Formulate the LMS algorithm for the present problem. In particular, specify 
the filter tap-weight vector, w(n), input vector, x(n), the desired output, 
d(n), and how the output error is defined in present case. 
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P6.11 


P6.12 


Figure P6.10 


The LMS algorithm is used to adapt an adaptive filter with tap-weight vector 
w(n). Define v(n) = E[w(n) — w,], where E[-] denotes statistical expectation 
and w, is the optimum value of the filter tap-weight vector. 


(i) Show that if the step-size parameter, m, is properly selected, |¥(n)|? = 
vT (n)¥(n) will approach zero, as n increases. 
(ii) Find the range of u that guarantees convergence of |¥(n)|*. Does this range 
guarantee the convergence of the LMS algorithm? 
(iii) Find the time constants that govern the convergence of |¥(n)|?. 


A communication channel with a FIR shorter than or equal to M-bit interval is 
to be identified using the setup shown in Figure P6.12. The transmitted data bits, 
s(n), which take values of +1 and —1, are passed through the channel, H(z). 
The same data bits are passed through an adaptive filter, H(z), which is adapted 
through the LMS algorithm, so that its output matches the output of the channel 
in the mean-square sense. The channel noise is modeled as an additive noise 
sequence v(n) with variance oe. The sequences s(n) and v(n) are independent of 
each other. Define the length M column vector g(n) = h(n) — h,, where h(n) is 
the channel model tap-weight vector at iteration n and the elements of the vector 
h, are the samples of channel response. 


(i) Show that 
g(n + 1) = (I— 2us(n)s"(n))g(n) + 2uv(n)s(n) 
where I is the identity matrix, s(n) = [s(n) s(n — 1) --- sn—M+1)]', 
and u is the LMS algorithm step-size parameter. 
(ii) Use the independence assumption to show that 


Ign + DI? = EITA — (4u — 4Mp)R,,)g(n)] + 4M po? 


where |g]? = Elg'(n)g(n)] and Rọ = E[s(n)s'(n)]. 
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Figure P6.12 


(iii) Use the result of Part (ii) to find the range of u, which guarantees the 
convergence of ||g(n)||*. Does this also guarantees the convergence of the 
LMS algorithm? 

(iv) Compare the range obtained in Part (iii) with the range of u given in Eq. 
(6.74). 


In this problem, we discuss the effect of the power level of the input process to 
an adaptive filter and its variation on the convergence of the LMS algorithm. 


(i) Consider the LMS recursion (6.9) and assume that the time constants of its 
different modes of convergence are Tg, T;,..., Ty_ 1. Keep u fixed, replace 
x(n) by x(n) = ax(n), where « is a constant, and obtain the corresponding 
time constants of the resulting recursion, in terms of t,’s, under the condition 
that the step-size parameter jz is small enough to guarantee the convergence 
of the algorithm. 

(ii) Under the condition that the power levels of the elements of x(n) are time 
varying and fluctuate slowly between high and low levels, what is the short- 
coming of the LMS algorithm (discuss)? Can you suggest any solution to 
this? 


This problem attempts to show the validity of the approximation (6.86) in a 
nonrigorous manner. 
Consider a random process x(n) and its associated (2M + 1)-by-(2M + 1) cor- 
relation matrix R. Let 

qi = lq; -m > 90 + dim)» with qq; = 1 
be the ith eigenvector of R and À; be its corresponding eigenvalue. 


(i) Show that the expansion of the relationship Rq; = à;q; leads to 


M 
XO balk- Dqir = Adi, for =M <1<M (P6.11.1) 
k=—M 


where ¢,,.(k — L) is the autocorrelation function of x(n) for lag k — l. 
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(ii) 


(iii) 


(iv) 


Let M — œ and take Fourier transform on both sides of equation (P6.11.1). 
Show that this leads to the identity 


Pa (e°) Q; (e) =A; Q; (e) (P6.11.2) 
where ®,.. (e/”) is the power spectral density of x(n) and 
CO 
Q;(e/") = >. qi pet” 
k=—00 


Consider the case when ®,,(e/) is a single-valued function of the angular 
frequency w. Using Eq. (P6.11.2), show that 


; a nonzero value, for œw = w; 
Q; (e/®) = ? l 


0, otherwise. 


Thus, argue that when M is large, the set of vectors 


; 2 iM  2ni(M—1) ; 2niM__p 
q; = ——— [e J2M+1 @ J 2M4T...e/IM+T] 


2M +1 
for —M <i < M, may be considered as an approximation to the eigenvec- 


tors of R. 
Also, from the Parseval relation (Chapter 2), recall that 


H 1 7 jo, 2 
q; q; = On |Q; (e?) dw 
T J—r 


Thus, conclude that 
|O;(e/”)|? = 278 (w — @) 


where 6(-) is the Kronecker delta function and w = œ; is the solution of the 
equation ®,, (e/@) =4,. 

Extend the above result and argue that the latter approximation is also valid 
in the cases where ®,, (e/”) is not necessarily single-valued. 


Now consider the case where x(n) is the output of a channel with system function 
H(z) and also the input to a (2M + 1)-tap equalizer W (z), as in Section 6.4.2. 
Ignore the channel noise and recall that, if the system delay A is assumed to 
be zero, the transmitted data symbols are assumed to be uncorrelated, and the 
equalizer is allowed to be noncausal, 


W,(z) © HO 


where W, (z) is the optimum setting of W (z). 
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(v) Using the approximation derived in Parts (iii) and (iv), show that when M 
is large 
1 1 
V2M+1 H(i HE) 
where w is the equalizer tap-weight vector. 
(vi) Using this result, show that 


wl; = qiw~ for—M<i<M 


1 
ajlwe il? x% 2M 41 
Power-line-induced noise is common in many equipments/instruments. Examples 
are humming noise in electric guitars and in electrocardiograms. A noise-canceling 
setup similar to the one presented in Figure P6.15a may be used to cancel such 
humming noise. Here, to simplify the derivation, the reference input (the source 
of humming noise) is chosen to be the complex-valued sign-wave e/0”. It is 
assumed that the frequency wg is known. The primary input d(n) consists of 
desired signal, which has been contaminated by a humming noise of the same 
frequency as the reference input, but with an unknown complex-valued gain w,. 
The single-tap adaptive filter, W (z), will ideally adjust w(n) = w, to cancel the 
humming noise perfectly. 
These problems that follow a similar procedure to the noise canceler of Widrow 
et al. (1975) show that the system presented in Figure P6.15a is effectively a 
notch filter whose bandwidth is controlled by the step-size parameter u of the 
LMS algorithm. 


(i) Present a recursive equation for the adaptation of the tap-weight w(n). 
(ii) Using the result of (i), show that Figure P6.15a can be redrawn as in 
Figure P6.15b. 
(iii) By developing the difference equation that relates y(n) and e(n), show that 
Figure P6.15b can be simplified to Figure P6.15c. 
(iv) Using the result presented in Figure P6.15c, show that the signal sequences 
d(n) and e(n) are related by the transfer function 


Ez) _ 1 — ej20z71 
D) 1—(1— 2u)ej2oz! 
(v) By presenting the pole and zero of the transfer function obtained in (iv) and 


proper argument, show that this is a notch filter, with a notch frequency at 
w = wp, Whose bandwidth decreases with ju. 


Note that even though we started with an adaptive filter, which is naturally a 
nonlinear time-varying system, we ended up showing that this effectively is a 
linear time-invariant system! 


This problem whose aim is to study the signed-regressor algorithm in some details 
is based on the derivations of Eweda (1990a). 

Using the Price’s theorem (Papoulis, 1991), one can show that if x and y are 
a pair of zero-mean jointly Gaussian random variables 


IRE 
E[x - sign(y)] = + [2w 
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Figure P6.15 


Consider the signed-regressor algorithm introduced in Section 6.5, and let the 
assumptions made at the beginning of Section 6.3 apply. Show that: 


(1) 
: T 5 Jy 1 2 
E[sign(x(n))x (n)] = E[x(n)sign(x’ (n))] = —,/ a 


Ox 
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(ii) 
vn+1l=d- 2usign(x(n))x! (n))v(n) + 2e,(n)sign(x(n)) 


(iii) 


E[lvn+1]= ( 2u- (2) E[v(n)] (P6.16.1) 


and from there argue that the signed-regressor algorithm follows the same 
trajectory as the conventional LMS algorithm. 
(iv) Define ||v(n)||? = E[v'(n)v(n)] and show that 


1 2 
Ivan + DI? = Iv? — 4 (E — pèn) E[v' (n)Rv(n)] 


+4? NE nin 


(v) Assuming that the signed-regressor algorithm is convergent, show that its 
misadjustment is given by 


N 
Maa (P6.16.2) 
1/2 — uw 


(vi) From this result and following a line of argument similar to the one in 
Section 6.3.5, show that the signed-regressor algorithm remains stable when 


2 


No, \ x 


O<u< 


(vii) When the step-size parameter, u, is small, Eq. (P6.16.2) reduces to 


M & uno,|* 


Using this result and Eq. (P6.16.1) and comparing these against their coun- 
terparts in the conventional LMS algorithm, show that when the step-size 
parameters of the two algorithms are chosen, so that both result in the same 
misadjustment, the signed-regressor algorithm is 2/7 times slower than the 
conventional LMS algorithm. 


Consider a case where the input vector, x(n), to an adaptive filter and its desired 
output, d(n), are fixed for all values of n. Assuming an initial value, w(0), for the 
filter tap-weight and running the LMS algorithm with a small step-size parameter 
(which guarantees the stability of that), find the final setting of the filter tap 
weights after the convergence of the LMS algorithm. Using the result obtained, 
confirm Nitzberg’s interpretation (Section 6.6) of the NLMS algorithm. 


In the derivation of NLMS recursion (6.107), we searched for the step-size 
parameter, u(n), which would minimize (e+(n))*, where e(n) is as defined 
in Eq. (6.104). We may also note that et (n) = e(n) — x!(n)n(n), where y(n) = 
w(n + 1) — w(n), as defined in Eq. (6.108). Furthermore, we note that to have an 
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P6.19 


P6.20 


P6.21 


P6.22 


P6.23 


P6.24 
P6.25 


LMS algorithm with variable step-size parameter, the increment y(n) has to be in 
the direction of x(n), that is, we may write n(n) = a(n)x(n), where a(n) is scalar. 


(i) Give an alternative derivation of the NLMS algorithm by optimizing a(n), 
so that (e+ (n))? is minimized. 

(ii) To limit the perturbation introduced by the vector y(n), it is proposed that 
(e+ (n))? + wn" (n)n(n) be minimized. Show that this leads to the recursion 


win + 1) = win) + e(n)x(n) 


1 
xT(n)x(n) + Y 
Consider the case where the input x(n) to an N-tap adaptive filter is generated 
by passing a white noise v(n) through an L-tap transversal filter. Let L < N 
and recall the definition (6.122). Show that for L > M, E[X(n)XT(n)] = LR. 


Does this observation have any implication on the performance of the APLMS 
algorithm? Explain. 


In the APLMS algorithm, consider the case where M = N. Show that in this case, 
the APLMS recursion (6.132), irrespective of the value of w(n) always converges 
to the solution w(n + 1) = (X'(n))~!d(n), provided that X'(n) is invertible. 


Develop and present a version of the APLMS algorithm for the case where the 
underlying processes are complex-valued. 


The following recursions have been proposed for implementation of a VSLMS 
algorithm: 
u(n) = wn — 1) — pVye*(n) 
and 
win + 1) = w(n) — a(n) ye’ (n) 

where u(n) is a diagonal matrix consisting of N separate variable step-size param- 
eters, yo(n), u(n), ..., “y_—1(7), p is a small positive step-size parameter, the 
gradient Vue (n) is a diagonal matrix compatible with u(n), and is evaluated at 
u = a(n — 1), the gradient Vye? (n) is a column vector, as usual, and is evaluated 
at w = w(n). 

Show that the proposal given above leads to the VSLMS algorithm, which was 
introduced in Section 6.7 and summarized in Table 6.4. 


Assuming that a scalar variable step-size parameter, u(n), is used for all taps of 
a transversal adaptive filter, show that a derivation similar to the one discussed 
in Problem P6.22 leads to the following recursion: 


u(n) = u(n — 1) — pe(n)e(n — 1)x™(n)x(n — 1) 
Give details of the derivation of Eq. (6.61) from Eq. (6.57). 


This problem looks at a variation of the LMS algorithm called leaky LMS algo- 
rithm. The leaky LMS algorithm works based on the recursion 


w(n + 1) = Bw(n) + 2ue(n)x(n) 


where £ is a constant slightly smaller than 1. 
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(i) Define w(n) = E[w(n)], where E[-] denotes statistical expectation, and use 
the independence assumption of Section 6.3 to show that the following recur- 
sive equation holds: 


w(n + 1) = (I — 2uR)w(n) + 2up 


Specify R’ and p and obtain the time constants of the learning curve of the 
leaky LMS algorithm in terms of the eigenvalues of the correlation matrix 
R = E{x(n)x'(n)} and the parameters 8 and m. 

(ii) Assuming that the step-size parameter u is small enough to guarantee the 
convergence of the leaky LMS algorithm, derive an equation for w(oo) in 
terms of R’ and p. 

(iii) Show that the difference between w(oo) and the optimum tap-weight vector 
of the adaptive filter is given by the following equation. 


N-1 
w(oo) — W, = =y 
i=0 


a tp" 


where y = tf 
R, respectively. 


, and i,’s and q;’s are the eigenvalues and eigenvectors of 


P6.26 Define the scalar value ||v(1)||? = E[v'(n)v(n)] as the misalignment of an adap- 
tive filter tap weights. 


(i) Show that 
IVI? = trfK’(n)] 


where the correlation matrix K’(n) is defined as in Eq. (6.31). 
(ii) Use Eq. (6.52) to show that 


N-1 
È u/A — 2ua,;) 
i=0 
N-1 

1— )) waj;/C — 2uà;) 


i= 


IIv(oo)||* = 


(iii) Show that when ju is small, the above result reduces to 
lIv(oo) ||? = uN 


P6.27 A complex-valued random process x(n) = u(n) + v(n) is available. The process 
u(n) = ae/“"*?), where a and @ are random, but fixed for every realization 
of x(n). The process v(n) is a complex-valued noise which may not be white. 
Assuming that the frequency, w, of u(n) is known, propose an adaptive filter and 
its associated adaptation algorithm to filter out v(m) from x(n) and enhance u(n) 
in the minimum MSE sense, preserving its phase, ¢, and its amplitude, a. 


P6.28 Repeat Problem P6.27 when u(n) = acos(wn + ġ) and v(n) is a real-valued noise 
sequence. 
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P6.29 


P6.30 


P6.31 


P6.32 


Give the details of the linearly constrained LMS algorithm required for adaptation 
of the tap-weight vector, w, of the beamformer of Example 6.4. 


The beamformer discussed in Example 6.4 assumes that the desired signal is 
arriving at the direction perpendicular to the line connecting A to B. What con- 
straint had to be imposed on the tap weights wọ and w4, if the desired signal was 
arriving at an angle 0 = 04? 


Griffiths and Jim (1982) have proposed a structure for beamforming whose per- 
formance is similar to that of the Frost algorithm; however, it does not need any 
constraint to applied. Example 6.4 is an example of Frost algorithm. The equiva- 
lent implementation of Figure 6.21, which follows the idea of Griffiths and Jim, 
is shown in Figure P6.31. Note that here the beamformer has only one tap weight, 
as opposed to Figure 6.21, which has two tap weights. Also, adaptation of the 
tap weight w of Figure P6.31 is based on the conventional LMS algorithm, as 
opposed to the Frost implementation (Figure 6.21), which requires the use of the 
linearly constrained LMS algorithm. 


(i) Explore the validity of the Griffiths and Jim algorithm in the present case. 

(ii) Figure P6.31 assumes that the desired signal is arriving in the direction 
perpendicular to the line connecting A to B. Modify this structure for the 
case when the desired signal is arriving at an angle 0 = 6). 


Figure P6.31 


The antenna array setup shown in Figure P6.32 has to be adopted to see a signal 
coming from a look angle 6. The spacing between the antennas is A,./2, where à, 
is the wavelength of the carrier of the incoming signal. What constraint should 
be applied to the tap weights wọ through wyy_;: 


(i) when 6 = 0. 
(ii) when 0 = 77/4. 
(iii) For the cases in (i) and (ii), write down a constrained LMS algorithm for 
adaptation of the tap weights. 


202 


Adaptive Filters 


Figure P6.32 


Computer-Oriented Problems 


All the programs that have been used to generate the results of the various examples of this 
chapter are written in MATLAB software package and are available in an accompanying 
website. The reader is encouraged to run these programs and confirm the results of this 
chapter. It would also be useful and enlightening to the reader if he tries other variations 
of the simulation parameters, such as coloring filter in modeling, channel response in 
channel equalization, and noise and sinusoidal signal powers in adaptive line enhancer. 
The following problems (case studies) are designed to guide the reader to many other 
interesting results. 


P6.33 


P6.34 


Consider the transfer function H, (z), of Eq. (6.80), as the channel response in the 
equalizer setup of Figure 6.8. Set the equalizer length, N, equal to 15. Find the 
minimum MSE of the equalizer for values of the delay, A, in the range 3-15. 
Repeat this experiment for the transfer function H,(z), of Eq. (6.81), as well. For 
each case, find the optimum value of the delay, A, which results in the minimum 
MSE. From these observations, arrive at a rule-of-thumb for selection of A in 
terms of the duration of channel response and equalizer length. 


In the line enhancer problem that was studied in Section 6.4.3, we noted that 
no slow mode appears on its learning curve when the tap weights are initialized 
to zero. To observe the slow modes of the line enhancer, perform the following 
experiment. Run the line enhancer program “lenhncr.m” (available on the 
accompanying website), starting with zero tap weights and using an input similar 
to the one used to obtain the result of Figure 6.11. After convergence of the line 
enhancer, run it again, starting with the latest values of the tap weights obtained 
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P6.35 


P6.36 
P6.37 


P6.38 


P6.39 


P6.40 
P6.41 


in the first run as initial tap weights, but change the frequency of the sinusoid in 
the input to wœ = 2/2. Compare the two learning curves that you have obtained 
and explain your observation. 


For the modeling problem that was discussed in Section 6.4.2, develop a program 
to study the convergence behavior of the NLMS algorithm. Compare your results 
with those of the conventional LMS algorithm for both choices of H(z) = H; (z) 
and H(z) = H(z), presented in Eqs. (6.80) and (6.81), respectively. 


Repeat P6.35 when the VSLMS algorithm is used for adaptation of the filter. 


Repeat P6.35 and P6.36 for the case where the adaptive filter of interest is the 
channel equalizer discussed in Section 6.4.2. 


For the modeling problem that was discussed in Section 6.4.1, develop a program 
to study the convergence behavior of the APLMS algorithm. Compare your results 
with those of the NLMS algorithm for both choices of H(z) = H(z) and H(z) = 
H(z), presented in Eqs. (6.80) and (6.81), respectively. Perform your study for 
the following cases of parameters and in each case find the misadjustment of both 
algorithms. 


(i) A = 0.5, y = 0.0001, and, for the APLMS algorithm, select M = 2, 3, 5, 
and 7. 

(ii) ñ = 1, y = 0.0001, and, for the APLMS algorithm, select M = 2, 3, 5, 
and 7. 


Write a program to study the convergence behavior of the line enhancer of 
Section 6.4.3 when the input, x(n), is given by 


x(n) = asin(@,n + 81) + b sin(@wn + 02) + v(n) 


where 6, and 0, are random phases that are uniformly distributed in the range 0 
to 27. Obtain the learning curves of the line enhancer for the following choices 
of the signal parameters: 


(i) œ = 21/6, @ = 57/8, a = 1, b = 1, o? = 0.1 
Gi) w, = 7/6, œ = 57/8, a = 5, b = 1, o? = 0.1 
Gii) œ = 7/6, œ = 7/4, a = 5, b = 1, o? = 0.1 


Run your program for the cases when the filter tap weights are initialized to zero 
and also when they are initialized to some random values. Study the results that 
you obtain and explain your observations. 


Write your own program to confirm the results of Figure 6.13. 


By adding proper lines to the MATLAB program “equalizer .m,” study the 
variation of the magnitude response of the equalizer as the LMS adaptation pro- 
ceeds. Perform your study for both choices of H(z) = H; (z) and H(z) = H(z), 
presented in Eqs. (6.72) and (6.73), respectively. Run the program for a misad- 
justment of 10% and present the magnitude response of the equalizer after every 
100 iterations. 
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Repeat Problem P6.41 when the LMS algorithm is replaced by the APLMS algo- 
rithm. Perform your study for the following choices of the algorithm parameters. 


(i) fi = 0.5, y = 0.0001, and M = 2, 3, 5, and 7 
Gi) ñ= 1, y = 0.0001, and M = 2, 3, 5, and 7 


Consider a communication channel with a complex-valued impulse response con- 
sisting of the following samples: 


—0.1 + j0.2, 0.15— j0.4, 1, 0.5—j0.2, —0.2+ j0.l 


where j = ./—1. The input data symbols are randomly selected from the alpha- 
bets 


i+j =1+j} =—1—j, andl=j 


with equal probability. The channel noise is white and at 30 dB below the signal 
level at the equalizer input. Develop a program to simulate this scenario and study 
the convergence behavior of the LMS algorithm in this case. 


Consider the scenario that is discussed in Example 6.4. Develop a program for 
adaptive adjustment of the coefficients wọ and w,. By running your program for 
different choices of a, Ops ġı, and @, study the behavior of the constrained 
LMS algorithm in this case. 
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Appendix 6A: Derivation of Eq. (6.39) 
Using the independence of x’(n) and v' (n), we get 
E[x’(n)x"" mv (nv mx (nx (n)] 
= Efx (n) xT (n)E[v mv T] (nx (n)] 
= Ex (nx T (WK (mx (nx Tn)] (6A.1) 
To expand the right-hand side of Eq. (6A-1), we first note that 


N-1N-1 


x (n)K'(n)x’(n) = 5 >) xix’, ki (n) (6A.2) 


i=0 j=0 
where x/(n) is the ith element of the vector x’(n). We also define 
C(n) = x’ mx TMK (mx (wx Tn) (6A.3) 


and note that it is an N-by-N matrix. The /mth element of C(n) is 


N-1N-1 


Chn (1) = x; (n) x}, (n) 5 5 x; (n)x'(n)kj;(n) (6A.4) 


i=0 j=0 
Taking statistical expectation on both sides of Eq. (6A.4), we obtain 


N-1N 


Elem = >~ 


i= 


-1 
Elxj(n)x;,(n)x;(n) x) ky (n) (6A.5) 
j=0 


S 


Next, if we consider the assumption that the input samples x(n) (and, thus, x;(n)’s) are 
a set of mutually Gaussian random variables, and note that for any set of real-valued 
mutually Gaussian random variables x,, X2, x3, and x4 
E[X1X7xX3X4] = E(x, x2] E[x3x4] + E(x, x3] E[x2x4] 
+ E[x x4] E[xx3] (6A.6) 


and, also, 
E[x;(n)x;(n)] = 1,6 — j) (6A.7) 


where 6(-) is the Kronecker delta function, we obtain 


Elx (2) xX, (n) x; (n) x), (n)] = A,A;5(U — mdi — j) + A,A_ SC — 1)5(m — j) 


+ AAs — j)d(m — i) (6A.8) 


m 
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Substituting Eq. (6A.8) in Eq. (6A.5), we obtain 


N-1IN-1 


Ecm n] = X D> ASL = MSG — Akin) 


i=0 j=0 
N-1N-1 

+Y YO aan SL = i)m = jkn) 
i=0 j=0 


1N-1 
AphmO(L — j)5(m — iki, (n) 
0 


N 
+ 


i=0 j= 


1 


N-1 
= 5 — m) Yo Akyn) + Am Kf (0) + eam Atkins (n) (6A.9) 
i=0 


for l=0,1,...,NŅN—1 and m=0,1,...,N=1. Noting that k,n) = k (n), 


y Fi Aik; (n) = tr[AK’(n)], and using the result of Eq. (6A.9) to construct the matrix 


E[C(n)] = ER (n)x" (n)v' my T (1) x’ (n) x" (n)] 


we obtain Eq. (6.39). 
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Transform Domain 
Adaptive Filters 


In Chapter 6, we noted that the convergence behavior of LMS algorithm depends on the 
eigenvalues of the correlation matrix, R, of the adaptive filter input process. Furthermore, 
we also saw that the eigenvalues of R are directly related to the power spectral density 
of the underlying process. Hence, we may say that the convergence behavior of LMS 
algorithm is frequency dependent in the sense that for an adaptive filter with the transfer 
function W(e/®), the rate of convergence of W(e/”) toward its optimum value, W,(e/”), 
at a given frequency w = w,, depends on the relative value of the power spectral density 
of the underlying input signal at w = @,, that is, ®,,(e/“). A large value of ®,, (e/”°) 
(relative to the values of ®,., (e/”) at other frequencies) indicates that the adaptive filter 
is well excited at œ = w,. This results in fast convergence around w = w,. On the other 
hand, the LMS algorithm converges very slowly over those frequency bands in which 
the adaptive filter is poorly excited. This concept which is intuitively understandable 
may also be confirmed through computer simulations (see simulation exercise P7.20 at 
the end of this chapter). 

A solution that one might intuitively consider for solving the above-mentioned problem 
of slow convergence of the LMS algorithm may be to employ a set of bandpass filters 
to partition the adaptive filter input into a few subbands and use a normalization process 
to equalize the energy content in each of the subbands. The equalized subband signals 
can then be used for adaptation of the filter tap weights. The content of this chapter is an 
elaboration of this principle for developing adaptive algorithms with better convergence 
behavior than the conventional LMS algorithm. 

In this chapter, we present an adaptive filtering scheme that uses an orthogonal transform 
for partitioning the filter input into subbands. This is called transform domain adaptive 
filter (TDAF) for obvious reasons. We present a thorough study of TDAF that includes 
not only its convergence behavior but also its efficient implementation. 


Adaptive Filters: Theory and Applications, Second Edition. Behrouz Farhang-Boroujeny. 
© 2013 John Wiley & Sons, Ltd. Published 2013 by John Wiley & Sons, Ltd. 
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e(n) d(n) 


Figure 7.1 Transform domain adaptive filter. 


7.1 Overview of Transform Domain Adaptive Filters 


Figure 7.1 depicts a block schematic of a TDAF.! A set of input samples x(n), x(n — 
1),..., x(n — N + 1) to the filter are transformed to a new set of samples, xy ọ(n), 
X7 (n), ..., X7,y—(n), through an orthogonal transform (7), before the filtering process. 
The tap weights w7 o, W71» --- Wr,y—1 are optimized so that the output error, e(n), is 
minimized in the mean-square sense. 

The orthogonal transform (7) is implemented according to the following equation 


x7(n) = Tx(n) (7.1) 


where x(n) = [x(n) x(n — 1) --- x(n — N + 1)] is the filter tap-input vector in the time 
domain, xz(n) = [x7 on) x71 (n) +++ X7, eGo): is the filter tap-input vector in the 
transform domain, and T is the transformation matrix, which is selected to be a unitary 
matrix?, that is, 

T{T=TT' =I (T2) 


Here, we assume that the elements of T are real-valued. When the elements of T are 
complex-valued, the superscript T in Eq. (7.2), indicating transposition, has to be replaced 
by H, that is, Hermitian transposition. 

The filter output is obtained according to the equation 


y(n) = w}xz(n) (7.3) 


! The TDAF was first proposed by Narayan et al. (1981, 1983). 
2 Throughout this chapter, the symbol T will be used to represent a unitary matrix satisfying Eq. (7.2). 
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where wy = [wr wr, --. Wy y—1]". We may note that although x7 (n) is in the trans- 
form domain, the filter output, y(n), is in the time domain. The estimation error 


e(n) = d(n) — y(n) (7.4) 


is also in the time domain. 
The cost function used to optimize the filter tap weights is 


£ = Efe*(n)] (7.5) 


Substituting Eqs. (7.3) and (7.4) in Eq. (7.5), we obtain 


£ = w} Rw; — 2w} pz + Eld’(n)] (7.6) 


where Ry = E[x7(n) x? (n)] and py = E[d(n)xz(n)]. Setting the gradient of € with 
respect to wy equal to zero, we obtain the corresponding Wiener—Hopf equation whose 
solution gives the optimum tap-weight vector of the TDAF as 


Wr = R7' pr (7.7) 


Substituting this result in Eq. (7.6), the minimum mean-squared error (MSE) of the TDAF 
is obtained as 


Emin = Eld?(n)] — pr Rz'pr (7.8) 


To compare this with the minimum MSE associated with the conventional transversal 
structure case (i.e., without the orthogonal transformation, 7), we note that 


R7 = E[xz(n)x7(n)] 
= TE[x(n)x' (n)|T" 
= TRT™ (7.9) 


and 


pr = Eld(n)x7(n)] 
= T E[d(n)x(n)] 
=Tp (7.10) 


Substituting Eqs. (7.9) and (7.10) in Eq. (7.8) and using Eq. (7.2), after some straightfor- 
ward manipulations, we get 


Emin = Eld’ (n)] — p'R™'p (1.11) 


Comparing this result with Eq. (3.27), we find that the minimum MSE associated with a 
conventional transversal filter and its corresponding TDAF is the same. This could also 
be understood, intuitively, if we note that the transformation x7 (n) = Tx(n) is reversible 
(i.e., x(n) = T'x-7(n)) and, thus, any output y(n) = w'x(n) can also be obtained from 
X7(n) using an appropriate tap-weight vector wy. To find the relationship between w and 
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Wr, we simply let w'x(n) = ws Xr (n) and use Eq. (7.1) to obtain 
wr =Tw (7.12) 


Before going into further details of TDAFs, in the next two sections, we study a very 
specific feature of orthogonal transforms, which makes them suitable for adaptive filtering 
algorithms. 


7.2 Band-Partitioning Property of Orthogonal Transforms 


We explore the discrete cosine transform (DCT) as an example of orthogonal transforms. 
The DCT of a sequence {x(n), x(n — 1), ..., x(n — N + 1)} is defined as 


N-1 
Xpera(n) = Do eyx(n-D,  fork=0,1,...,N—1 (7.13) 
1=0 
where 
i = — — 
VN’ k=0 and 1=0,1,...,N 1 
Cu = | Z cos ZAD, k= 1,2,...,N—1 (7.14) 


and /=0,1,...,N-—1 
are the DCT coefficients. It is also worth noting that Eq. (7.13) may be written as 
Xpcr(2) = Tpcrx(n) (7.15) 


where T per is the N-by-N DCT matrix. The k/th element of T per is cy, as defined in 
Eq. (7.14) and 


Xpcer() = [Xpcro(™ Xper,1() ++ Xper, -1 0)" 


Besides being a linear transformation, the process defined by Eq. (7.13) (or Eq. (7.15)) 
may also be viewed as an implementation of a bank of finite-impulse response (FIR) 
filters whose coefficients are c,,’s. Here, these are referred to as DCT filters. The transfer 
function of the kth DCT filter is 


N-1 


Cez) = J cuz” (1.16) 


1=0 


Figure 7.2 shows the magnitude responses of the DCT filters when N = 8. The plots 
clearly show the band-partitioning property of the DCT filters. Each response has a 
large main lobe that may be identified as its passband, and a number of side lobes that 
correspond to its stop-band. Similar plots (with some variations in the shapes) are also 
obtained for other commonly used orthogonal transforms, for example, discrete Fourier 
transform (DFT). 

Before we elaborate on how or why this band-partitioning property of orthogonal trans- 
forms is important to us, we shall look at this property from a different angle in the 
next section. 
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Figure 7.2 Magnitude responses of the DCT filters for N = 8. 


7.3 Orthogonalization Property of Orthogonal Transforms 


The band-partitioning property of orthogonal transforms gives a frequency domain view 
of them. The dual of this in the time domain is the orthogonalization property of such 
transforms. This property can be deduced intuitively from the band-partitioning property 
observed in Section 7.2. We recall that processes with mutually exclusive spectral bands 
are uncorrelated with one another (Papoulis, 1991). On the other hand, from the band- 
partitioning property, we note that the elements of the transformed tap-input vector, X7 (n), 
constitute a set of random processes with approximately mutually exclusive spectral bands. 
This implies that the elements of x;(n) are (at least) approximately uncorrelated with 
one another. This, in turn, implies that the correlation matrix Ry = E [xz (n)x} (n)] is 
closer to a diagonal matrix than R is. An appropriate normalization can convert Ry to 
a normalized matrix R} whose eigenvalue spread will be much smaller than that of 
R, thereby improving the convergence behavior of the LMS algorithm in the transform 
domain. This can be best explained through a numerical example. 

Consider the case where x (n) is a first-order autoregressive process generated by passing 
a white noise process through the system function? 


V1—a2 


H = 
@) l—az! 


(7.17) 


3 The reader may recall that we used the same process in many examples in the previous chapters. 
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where œ is a constant in the range of —1 to +1. For a = 0.9, we obtain 


1.0000 0.9000 0.8100 0.7290 
R= 0.9000 1.0000 0.9000 0.8100 (7.18) 
~ | 0.8100 0.9000 1.0000 0.9000 ` 


0.7290 0.8100 0.9000 1.0000 
For a derivation of R, see Example 4.1. Using the DCT as the transformation, we get 


3.5245 0.0000 —0.0855 0.0000 
R- = 0.0000 0.3096 0.0000 —0.0032 (1.19) 
7 = | —0.0855 0.0000 0.1045 0.0000 ` 


0.0000 —0.0032 0.0000 0.0614 


This clearly is much closer to diagonal (i.e., its off-diagonal elements are relatively closer 
to zero) when compared to R. 

The normalization performed in the implementation of the LMS algorithm in transform 
domain (as we shall see later), in effect, is equivalent to normalization of the elements 
of x7 (n) to the power of unity. This is done by premultiplying x;(n) with a diagonal 
matrix, D~!/?, before the filtering and adaptation process, where D~!/? is the inverse of 
the square root of the diagonal matrix 


E[x} 9(n)] 0 ves 0 
0 E[xz m] ++ 0 
D= : , , (7.20) 
0 0 + Elxd y_1@)] 
Thus, we get 
x} (n) = D'/?x7(n) (7.21) 


where x7(n) is the normalized tap-input vector. The correlation matrix associated with 
x3 (n) is 
R} = DPR, D! (1.22) 


Furthermore, we note that 
D = diag[R7] (7.23) 


where diag[R7] denotes the diagonal matrix consisting of the diagonal elements of Rz. 

The reader may easily verify that the mean-squared values of the elements of x7(n) as 

well as the diagonal elements of R} are all equal to unity as a result of this normalization. 
For the above example, we get 


05327 0 0 0 
F 0 172 0 0 
1/2 
DoE 0 30934 0 (7.24) 
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and 


1.0000 0.0000 —0.1409 0.0000 
R= 0.0000 1.0000 0.0000 —0.0231 (7.25) 
T ~~ | —0.1409 0.0000 1.0000 0.0000 ` 


0.0000 —0.0231 0.0000 1.0000 


To compare the performance of the conventional LMS algorithm (in the time domain) 
with its associated implementation in a transform domain (explained in the following 
section), the eigenvalue spreads of R and R} have to be examined. For the example 
mentioned previously, we obtain 


eigenvalue spread of R = 57.5 


and 
eigenvalue spread of R} = 1.33. 


For the present example, these results predict a much superior performance of the LMS 
algorithm in the transform domain compared to its conventional implementation in the 
time domain. This, clearly, is a direct consequence of the orthogonalization property 
of the DCT, as was demonstrated previously. This argument justifies the application of 
orthogonal transforms for improving the performance of the LMS algorithm. 

In our study in this chapter, we find that for a given transform, the degree of improve- 
ment achieved by replacing the conventional LMS algorithm with its transform domain 
counterpart depends on the power spectral density of the underlying input process. We 
emphasize on the band-partitioning property of orthogonal transforms and present some 
theoretical results that explain this phenomenon. We also find that a rough estimate of 
the power spectral density of the underlying input process is sufficient for the purpose of 
selecting an appropriate transform. 


7.4 Transform Domain LMS Algorithm 


In the implementation of transform domain LMS (TDLMS) algorithm, the filter tap 
weights are updated according to the following recursion: 


wz(n + 1) = wz (n) + 2u D~ !e(n)xr (n) (7.26) 


where D is an estimate of the diagonal matrix D. This vector recursion can be decomposed 
into the following N scalar recursions: 


wr;(n +1) = wrin) +2- —eln)xr;(n),  i=0,1,...,N-1 (1.27) 
| a O 
where Oor (n) is an estimate of E [xz (n)]. This shows that the presence of D~ in 


Eq. (7.26) is equivalent to using different step-size parameters at various taps of the 
TDAF. Each step-size parameter is chosen proportional to the inverse of the power of 
its associated input signal. Noting this, we refer to Eq. (7.26) as a step-normalized LMS 
recursion. In the present literature, the term normalized LMS algorithm has often been 
used to refer to Eq. (7.26) (Narayan et al. 1983; Marshall et al., 1989). In this book, 
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we use the term step-normalized (when necessary) to prevent any confusion between 
the normalization applied to TDAFs and the normalized LMS algorithm, which was 
introduced in Chapter 6 (Section 6.6). 

In the implementation of Eq. (7.26), one needs to obtain the estimates of the signal 
powers at various taps of the filter, that is, cre ,(n)’s. The following recursions are usually 
used for this purpose: 


ô? ,(n) = pô? a — 1) + (1 — A)x7,(n), i=0,1,...,N—1 (7.28) 


where £ is a positive constant close to but less than 1. This recursion estimates the power 
by calculating a weighted average of the present and past samples of tr mys using 
an exponential weighting function given by 1, B, 67, ... (Problem P7.2). The TDLMS 
algorithm, including this signal power estimation, is summarized in Table 7.1. 

The step-normalization, as applied in Eq. (7.26), is equivalent to the normalization of 
the elements of the transformed tap-input vector, xy (n), to the power of unity. To show 
this, we multiply Eq. (7.26) on both sides by D!/2 (the diagonal matrix consisting of 
the square roots of the diagonal elements of D) and define wi(n) = D!/2w7 (n) and 
x(n) = D~'/?x7(n), to obtain 


wz (n + 1) = w3 (n) + 2ue(n)xz (n) (7.29) 


Table 7.1 Summary of the TDLMS algorithm. 


Input: Tap-weight vector, w7 (n), 
input vector, x(n), 
past tap-input power estimates, a ^- 1), 
and desired output, d (n) l 

Output: Filter output, y(n), 
tap-input power estimate updates, ô 0), 
and tap-weight vector update, w7 (n $ 1) 


1. Transformation: 
xy = Tx(n) 


1. Filtering: 
y(n) = wy (n)xp(n) 


2. Error estimation: 
e(n) = d(n) — y(n) 


3. Tap-input power estimate update: 
for i=OtoN-1 


ĉir, (n) = BSE, (0 — 1) + (1 B)xz,) 


4. Tap-weight vector adaptation: 
wz(n + 1) =we(n) + 2uD-te(n)xz(n) 


A = diaig? A2 52 
where D = diagloy, (1), 07, 0) Oy ya 
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We may also note that 


e(n) = d(n) — y(n) 
= d(n) — wy (n)xņ (n) 
= d(n) — wi (n)x} (n) (7.30) 


Equations (7.29) and (7.30) suggest that the TDLMS algorithm, in effect, is equivalent to 
a conventional LMS algorithm with the normalized tap-input vector x} (n). 

The significance of this result is that as Eq. (7.29) is a conventional LMS recursion, 
the analytical results of the last chapter can immediately be applied to evaluate the per- 
formance of the TDLMS algorithm. In particular, we note that the various modes of 
convergence of the TDLMS algorithm are determined by the eigenvalues of the correla- 


tion matrix R} = E [x?-(n)xh (n)]. This matches our conjecture in the last section. Also, 
by substituting the eigenvalues of R} for Ao, Ài; ..., Ay_1, in Eqs. (6.61), (6.63), or 


(6.64), misadjustment of the TDLMS algorithm can be evaluated. In particular, we note 
that tr[R>] = N as the diagonal elements of R} are all normalized to unity. Thus, using 
Eq. (6.64), misadjustment of the TDLMS algorithm is obtained as 


Mx uN (7.31) 


7.5 Ideal LMS-Newton Algorithm and Its Relationship with TDLMS 


In Chapter 5, we introduced two search methods: the method of steepest-descent and the 
Newton’s algorithm. The LMS algorithm was introduced in Chapter 6 as a stochastic 
implementation of the method of steepest-descent. In the LMS algorithm, the gradient 
vector V,,é is replaced by its instantaneous estimate, Vye7(n). A similar substitution of 
the gradient vector in the Newton method, given by Eq. (5.44), results in the recursion 


w(n + 1) = w(n) + 2uR!e(n)x(n) (7.32) 


which may be called ideal LMS-Newton algorithm. The term ideal refers to the fact that 
the knowledge of true RT! is assumed here. In actual practice, of course, this cannot 
be true. One can only obtain an estimate of R~!. Methods for obtaining such estimates 
are available in the literature (see e.g., Widrow and Stearns (1985); Marshall and Jenkins 
(1992); Farhang-Boroujeny (1993)). Our aim in this section is to show that there is a close 
relationship between the LMS—Newton and TDLMS algorithms. We show that when the 
transformation matrix T is selected to be the Karhunen-Loéve transform (KLT) of the 
filter input, the TDLMS and LMS—Newton are two different formulations of the same 
algorithm. Thus, we conclude that when a proper transformation is used, the TDLMS 
algorithm may be considered as an efficient implementation of the LMS—Newton algo- 
rithm. 

We recall that the correlation matrix R can be decomposed as R = QAQ', where Q 
is the N-by-N matrix whose columns are the eigenvectors of R, and A is the diagonal 
matrix consisting of the associated eigenvalues of R. This, in turn, implies that 


R7'=Qa7'Q' (7.33) 
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because QQ? = I. 
Substituting Eq. (7.33) in Eq. (7.32) and premultiplying the result by QT, we obtain 


w (n + 1) = w(n) + 2uA7!e(n)x’(n) (7.34) 


where w’(n) = Q'w(n) and x’(n) = Q' x(n). Also, from our discussions in Chapter 4, 
we recall that the eigenvalues of R, that is, the diagonal elements of A, are equal to the 
powers (mean-squared values) of the elements of the vector x’(n). Combining this with 
our discussion in the previous section, we see that the LMS—Newton algorithm is an 
alternative formulation of the TDLMS algorithm when T = QT. Furthermore, we note 
that for a given input process, x(n), with correlation matrix R, the transform T = QT is 
the ideal one in the sense that it results in a diagonal Ry = A. When T + Q", but results 
in an approximately diagonal R7, we may say that the TDLMS algorithm is equivalent 
to a quasi LMS—Newton algorithm. 


7.6 Selection of the Transform 7 


It turns out that for a given process, x(n), the performance of the TDLMS algorithm may 
vary significantly depending on the selection of the transformation matrix, T. A transform 
that may perform well for a given input process may perform poorly once the statistics of 
the input changes. This happens to be more prominent when the filter length is short. For 
long filters, we find that most of the commonly used transforms perform well and result in 
a significant performance improvement compared with the conventional LMS algorithm. 

In this section, we present some theoretical results that explain these observations. 
This presentation includes a geometrical interpretation of the TDLMS process, which 
will be given for a two-tap filter. This interpretation will then be generalized using a 
special performance index, which is also introduced in this section. This leads to a very 
instructive view of the band-partitioning property of orthogonal transforms, which helps 
one to select a proper transform once a rough estimate of the power spectral density of 
the underlying input process is known. 


7.6.1 A Geometrical Interpretation 


The geometrical interpretation presented here has been adopted from Marshall et al. 
(1989). We recall that the performance surface of a transversal filter with input correlation 
matrix R may be written as 

ECV) = Enin + V RV (7.35) 


where the vector v is the difference between the filter tap-weight vector, w, and its 
optimum value, w,. 
For the sake of illustrating the principles, let us consider a two-tap filter problem with 


R given by 
R= 1.0 0.9 (7.36) 
~ 10.9 1.0 ' 


With this R, Eq. (7.35) becomes 
E (Vo, vı) = Emin F vp F v? + 1.8v9v, (7.37) 
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Figure 7.3 A geometrical interpretation of the TDLMS algorithm. (a) Performance surface 
before transformation; (b) performance surface after transformation, but without normalization; 
and (c) performance surface after transformation and normalization (adopted from Marshall et al. 
(1989)). 


Figure 7.3a shows the contour plot associated with this performance surface. As we may 
recall from our discussions in the previous chapters, the eccentricity of the contour ellipses 
in Figure 7.3a, is related to the eigenvalue spread of the correlation matrix R. A large 
eccentricity is due to a large eigenvalue spread and that, in turn, results in certain slow 
mode(s) of convergence when the conventional LMS algorithm is used to adjust the filter 
tap weights. 

Application of an orthogonal transform, J, converts the tap-input vector x(n) to 
X7(n) = Tx(n), whose associated correlation matrix, Rz, is related to R according to 
Eq. (7.9). As a numerical example, let us choose 


08 0.6 
a= Bee He ve 
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This, with R as given in Eq. (7.36), results in 


1.864 0.252 
Rr= feet A UAR 


If no normalization is applied to the transformed samples, the performance surface asso- 
ciated with the TDAF will be 


ET (Yr) = Emin + VP Reve (7.40) 
which for the present numerical example can be expanded as 
Er (VT o Ur, 1) = Emin + 1.86407 o + 0.13607, + 0.50407 vz (7.41) 


Figure 7.3b shows the contours associated with the performance surface defined by 
Eq. (7.41). Note that the effect of the transformation is only to rotate the performance 
surface with respect to the coordinate axes. The shape of the performance surface, that 
is, the eccentricity of the contour ellipses, has not changed. This can be mathematically 
explained by noting that, as TT! = T'T =I, 


E(V) = Emin + VRV 
= nin +Y Z"TRI"Tv 
= Enin + VrRrvr = £r (v7) (1.42) 


where v and vz are related according to the equation vy = Tv. This result, which can 
also be written as £z (v7) = &(T Ty), shows that the performance surface defined by 
Eq. (7.40) is obtained from the one defined by Eq. (7.35) by a rotation of the coordinate 
axes according to the relationship vy = Tv, or, equivalently, by keeping the coordinate 
axes fixed and rotating the performance surface in the opposite direction. This observation 
shows that transformation without normalization has no effect on the convergence behavior 
of the steepest-descent method and, thus, the LMS algorithm. Thus, we emphasize that 
normalization has to be considered as an integrated part of any transform domain adaptive 
algorithm (as introduced in the case of TDLMS algorithm, in Section 7.4), otherwise 
transformation adds up to the filter complexity without any gain in convergence. 

When the elements of x(n) are normalized to the power of unity*, the corresponding 
correlation matrix is given by Eq. (7.22) and its associated performance surface is defined 
as 


ELT) = Emin + VE REVIT (7.43) 


For the present example, we obtain 


n _ | 1.0000 0.5005 
= fess md (7.44) 
and 
ET (UT 0> UT,1) = Emin + (V7 0)” + (07,1)? + 1.00107 vy, (7.45) 


4 We recall that the step-normalization, as applied in Eq. (7.26), and normalization of the elements of xz (n) to the 
power of unity are equivalent. 
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Figure 7.3c shows the contours associated with Eq. (7.45). We note that the normalization 
reduces the eccentricity of the performance surface. This, of course, will result in a faster 
convergence of the TDLMS algorithm compared to the conventional LMS algorithm. 

A better insight into the effect of normalization is obtained by making the following 
observations. The hyperellipses associated with the performance surface of a transversal 
filter are hyperspherical at the points of their intersection with the v-axes, that is, the 
intersection points of each contour (hyperellipse) with the v-axes are at equal distance 
from the origin. This which is clearly observed in Figure 7.3a can be shown to be true, 
in general, if we note that, for any i 


§;(u;) = Einin +f v; 


where r; is the ith diagonal element of R, and &;(v;) is the performance function &(v) 
when all elements of the vector v, except its ith element, v;, have been set equal to zero. 
We also note that for a transversal filter, r; is the same for all values of i. Thus, the 
identity 

é; (v;) = £; (v;), for alli and j 


implies that 
lv;| = |; 


which, in turn, shows that the hyperellipses associated with the performance surface of a 
transversal filter are hyperspherical at the points of their intersection with the v-axes. 

Following the same argument, we find that as the diagonal elements of Ry are likely 
to be unequal (unless the underlying input process, x(n), is white, i.e., when R is a 
multiple of the identity matrix), the contour ellipses associated with &;(v7) are most 
likely nonhyperspherical at the points of their intersection with the v;-axes. This is 
clearly observed in Figure 7.3b. 

On the other hand, normalization of the transformed samples to the power of unity 
equalizes the diagonal elements of R}. Thus, the hyperellipses associated with the per- 
formance surface &+(v7) are hyperspherical at the points of their intersection with the 
corresponding coordinate axes. To get a better insight, we may also note that 


Er(Vr) = Einin F v} Ryvz 
= Enin + V7 D'/D RID D! y, 
= Emin + (D'?v7)'RED' vz) = EF (V7) (7.46) 


where v} = D!/?vz, and we have noted that (D!/?)T = D!/? as D is a diagonal matrix. 
This result, which can also be written as £2 (v3) = &;(D~'/?v#_), shows that the perfor- 
mance surface defined by Eq. (7.43) is obtained from the one defined by Eq. (7.40) by 
scaling its coordinate axes according to the relationship v} = D!/*v.-. For the example 
shown in Figure 7.3, this is equivalent to stretching the contour ellipses of Figure 7.3b 
along vz 9-axis and shrinking them along vy ,-axis. This clearly reduces the eccentricity 
of the ellipses. Furthermore, we note that the ellipses in Figure 7.3c would become circles 
resulting in maximum improvement in convergence if the ellipses in Figure 7.3b had been 
rotated so that their principal axes would be along the vz-axes. It is interesting to note that 
this corresponds to the case where T is the Karhunen-Loéve transform (KLT) associated 
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with the correlation matrix R. We may also recall from Chapter 4 that in the latter case 
R7 = A, that is, the diagonal matrix consisting of the eigenvalues of R. Furthermore, 
from the minimax theorem (of Chapter 4), we find that this corresponds to the case where 
the diagonal elements of Ry are maximally spread. Moreover, a closer look at the present 
example reveals that the effect of normalization in reducing the eccentricity of the contour 
ellipses depends on the relative size (i.e., spread) of the diagonal elements of R7. In other 
words, the spread of the signal power at the filter taps after transformation appears to be 
the key factor that determines the success of a TDAF. The discussion that follows in the 
rest of this section aims at exploring this aspect of TDAFs further. 


7.6.2 A Useful Performance Index 


In the study of the LMS algorithm, eigenvalue spread, that is, à max/Amin» Of the corre- 
lation matrix, R, of the underlying input process is the most widely used performance 
index. In this book also, so far, we have emphasized on the significance of eigenvalue 
spread. Unfortunately, there is no way of getting closed-form (explicit) equations for the 
maximum, max» and minimum, A,,;,, eigenvalues of a matrix R, in general. As a result, 
application of this index for any further study of the TDLMS algorithm and its compari- 
son with the conventional LMS algorithm is not possible. Hence, we shall look for other 
possible performance indices that may be mathematically tractable. 

Farhang-Boroujeny and Gazor (1991, 1992) proposed an index that is mathematically 
tractable and able to give some further insight into the effect of orthogonal transforms in 
improving the performance of the LMS algorithm. The proposed index is 


z N 
p(R) = (*) (7.47) 
Àg 


where à, and A, are arithmetic and geometric averages, respectively, of the eigenvalues 


g 
of R. Namely, 
Didi 
i= a (7.48) 
and 
Ag= oN ] [a (7.49) 


We note that the value of o(R) depends on the distribution of the eigenvalues of R. It is 
always greater than or equal to 1. It approaches | when all the eigenvalues of R assume 
about the same values and increases as the eigenvalues of R spread apart. Furthermore, 
the lower bound p(R) = 1 is reached when the eigenvalues of R are all equal. Using the 


identities (Chapter 4) 
Soa; = tf] 
where tr[R] is the trace of R, and 


I] à; = det[R] 


[i 
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Figure 7.4 Variation of o(R) versus eigenvalue spread of R. 


where det[R] is the determinant of R, we obtain 


(tel RI/V)” 
p(R) = — R] 


Now, one may appreciate the index o(R) because of its closed-form nature in terms of 
elements of R. 

Before we proceed with the application of the performance index p(R) to further study 
the TDLMS algorithm, we may remark that the relationship between (R) and the eigen- 
value spread of R, that is, A,,,x/Amin» 18 rather complicated. The index p(R) depends on not 
only À max/min» but also the distribution of the rest of eigenvalues of R in the range Amin 
tO À max: However, the general trend is that a large eigenvalue spread of R implies a large 
p(R) and vice versa. Similarly, a o (R) close to 1 implies that the eigenvalue spread of R is 
small. Figure 7.4 shows how p(R) varies as a function of À max/min When N = 10 and the 
eigenvalues of R are assumed to be a set of random numbers distributed in the range 0 to 1. 


(7.50) 


7.6.3 Improvement Factor and Comparisons 


To compare a pair of LMS-based algorithms, say LMS, and LMS,, we define an improve- 


ment factor, 7,, as the natural logarithm of the ratio of the performance index p(-) in the 
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two cases. In particular, when LMS, is compared with LMS,, we define 
T, =In p(R;) — In p(Rg) (7.51) 


where R, and R, are the associated correlation matrices in LMS, and LMS,, respectively. 
A positive 7, indicates that LMS, is superior, and a negative 7, indicates that LMS, is 
inferior. In comparing a TDLMS algorithm and its conventional LMS counterpart, we 
shall let R; = R and R, = R}. On the other hand, if no normalization is applied in the 
implementation of TDLMS algorithm, we shall let R, = R7. Thus, for the latter case, 
the corresponding improvement factor is 


I, r = In p(R) — In o(R7) (7.52) 
We note that 
_ (“[R7]/N)™ 
p(R7) = “eR 
_ (“ITR7/N)” 
~  det({ TRT™] uraa 


To simplify this, we recall the following results of matrix algebra. If A and B are N-by-M 
and M-by-N matrices, respectively, then 


tr[AB] = tr[BA] (7.54) 


Also, when A and B are square matrices 


det[AB] = det[BA] = det[A] ° det[B] (7.55) 
Using Eqs. (7.54) and (7.55) in Eq. (7.53), we obtain 
_ @IZTTR]/ NY 
a =P ®) (7.56) 


where the last equality follows as ZTT = I. Substituting Eq. (7.56) in Eq. (7.52), we get 
IoT = 0 


This shows that transformation without normalization has no effect in improving the per- 
formance of the LMS algorithm. This result, which was also predicted by the geometrical 
interpretation of the TDLMS algorithm before, (Figure 7.3), can also be understood if we 
recall the definition of o(R) (i.e., Eq. (7.47)) while noting that for an arbitrary orthogonal 
transformation T, with Z7" = I, the eigenvalues of R and Ry = TRT T are the same 
(Problem P7.7). 

Another case of interest to be noted here is the comparison of the conventional LMS 
algorithm and the ideal LMS—Newton algorithm. In this case, R; = R and R, = I. Thus, 
the improvement achieved using an ideal LMS—Newton algorithm instead of its conven- 
tional LMS counterpart is 


Tp max = In p(R) — In o) 
= In p(R) (7.57) 
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as p(I) = 1. The notation /, max reflects the fact that the ideal LMS—Newton algorithm 
results in the maximum possible improvement that one can achieve by modifying the con- 
ventional LMS algorithm. Furthermore, Eq. (7.57) indicates that In p(R) can be considered 
as a measure of the distance of the LMS algorithm from the ideal LMS—Newton algorithm. 
Similarly, in evaluating a particular implementation of the TDLMS algorithm, the value of 
In p(R7) shows the distance of the TDLMS algorithm from the ideal LMS—Newton algo- 
rithm and, thus, it may be considered as a parameter indicating the extent of decorrelation 
that is achieved by the transformation. 

The following theorem shows an easy way to compare a transform-domain-normalized 
LMS (TDNLMS) algorithm with its conventional LMS counterpart. 


Theorem 7.1 When the conventional LMS algorithm is replaced by its TDLMS counter- 
part, the resulting improvement factor is 


"| = In p(diag[R7]) (7.58) 


where Ry = TRT", R is the correlation matrix of the underlying input process, T is 
the transformation matrix, and diag[R,] denotes the diagonal matrix consisting of the 
diagonal elements of Rr. 


Proof. According to Eq. (7.51), the improvement factor is 
nz = In p(R) — In pR} (7.59) 


We recall that the diagonal elements of the normalized N-by-N matrix R} are all equal 
to 1. This implies that 
tr[R}] = N (7.60) 


Noting this and using Eqs. (7.22) and (7.55), we may proceed as follows: 


(R/M) 
det[R® ] 
1 
~ det[D 2RD 72] 
1 
~ det[D R7] 
1 
~ det{D—!]+det[R7] 
det[D] 


~ det{R7] io) 


pR?) = 


where the last equality follows from the identity det[D7!] = (det{D])~!. Next, substituting 
for D from Eq. (7.23) and noting that tr[R7] = tr[diag[R7]], we get 
det[diag[R7]] 

det[R7] 


pR?) = 
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_ _ det[diag[R7]] | (trR7]/ M)” 
~ (tr[diag[R7]]/N)" det[R7] 
p(R7) 


ORANLA (7.62) 
p (diag[R7]) 


Substituting Eq. (7.62) in Eq. (7.59) and noting that o (R) = o (R7), completes the proof. 


The corollary is as follows: 


Corollary 7.1 As In p(diag[R7]) is always nonnegative, the performance of a TDLMS 
algorithm can never be worse than its conventional LMS counterpart. 


The following remark may also be made. When comparing a TDLMS algorithm with 
its conventional LMS counterpart, the degree of improvement achieved depends on the 
distribution of the signal power at various outputs of the transformation, that is, the 
tap inputs x; ;(n). A wide spread of signal power at the taps indicates a significant 
improvement. Similarly, a small spread in signal powers indicates that the improvement 
achievable is very less. 


7.6.4 Filtering View 


The quantitative result of the above theorem suggests that for a given input process, a 
transformation matrix will effectively decorrelate the samples of input if it implements 
a set of parallel FIR filters whose output powers are close to maximally spread. The 
maximally spread signal powers, here, is quantified by the minimax theorem, which was 
introduced in Chapter 4. When the correlation matrix of the underlying input process is 
known, the minimax theorem suggests a procedure for the optimal selection of a set of 
filters, which achieve maximum power spreading. It starts with the design of a set of filters 
(with orthogonal coefficient vectors) whose output powers are maximized. Instead, it may 
also start with the design of another set of filters whose output powers are minimized. We 
also note that these two optimization procedures are implemented independent of each 
other, but both result in the same set of eigenvectors. This gives an intuitive feeling of 
how the minimax theorem (procedure) finds a transformation with a maximum spread of 
signal powers at its outputs. 

We note that while the minimax theorem suggests a procedure for the design of the 
optimal transform for a given input process, the above theorem gives a measure of effec- 
tiveness of a transformation matrix in decorrelating the samples of an underlying input 
process. We note that for a given input process with correlation matrix R, the maximum 
attainable improvement factor is Z, max = In p(R), and this is achieved when T is the 
KLT of the underlying input process. On the other hand, for a given transformation, T, 
I,.7 = In p(diag[R7]). Thus, the difference Z5 max — I} 7 gives a measure of the success 
of T in decorrelating the input samples. A small value of To matm Ig „z indicates that 
the transformation used is close to optimal and vice versa. Furthenmiore, as explained in 
Section 7.6.3, I — IT = In p(R7) is also the distance of the TDLMS from the ideal 


> “p,max 


LMS -Newton algorithm. 
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It is instructive to elaborate more on the power spreading effect of a transformation T 
and relate that to the above findings. We recall that the output power of a filter with the 
transfer function F(e/®), input x(n), and output y(n) is given by (Chapter 2) 


2r 
E[y?(n)] = x Í ®,, (e12) |F (e2) do (1.63) 


where ® (e/®) is the power spectral density of x(n). Now, if F(e/”) is the transfer 
function of a filter whose coefficients constitute the elements of a row of a transformation 
matrix T, with TT! = I, then F(e/”) is constrained to satisfy the following identity 


1 20 , 
zl |F (e2) do = 1 (1.64) 


This follows from the Parseval’s relation (Chapter 2, Section 2.2). Noting this, we may 
say that the diagonal elements of Ry (i.e., the signal powers at the outputs of the FIR 
filters defined by the rows of T) are a set of averaged values of the power spectral 
density function, ®,,.(e/”), of the underlying input process. The weighting functions used 
to obtain these averages are the squared magnitude responses of the FIR filters associated 
with the various rows of T. 

The numerical example that was given in Section 7.3 shows that the DCT is very 
effective in decorrelating the samples of the input process, x(n), which was considered 
there. A closer look at this particular example is very instructive. Figure 7.5 shows 
the power spectral density, ®,,(e/”), of the underlying input process, x(n). The main 
characteristic of this process to be noted here is that it is of lowpass nature, that is, most of 
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Figure 7.5 Power spectral density of the process x(n) that is generated by the coloring filter 
Eq.(7.17). 
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its spectral energy is concentrated over low frequencies. We also refer to Figure 7.2 where 
the magnitude responses of the DCT filters are shown, for N = 8, and note the following 
features. The side lobes of the filters, whose passbands are over higher frequencies (closer 
to 0.5), are smaller than the side lobes of the filters whose passbands are over lower 
frequencies (close to zero). This, as we show next, is a very special characteristic of 
the DCT, which makes it an effective transform when it is applied for decorrelating 
the samples of a process that is dominantly lowpass in nature. To see this, we refer to 
Eq. (7.63) and note that when the main (passband) lobe of F(e/®) lies in the frequency 
bands where ®,, (e/”) is large (relative to its values in other frequency bands), the value 
of E [y?(n)] is not much affected by the size of the side lobes of F(e/®). On the other 
hand, when the main lobe of F(e/”) lies in frequency bands where ®,, (e/”) is relatively 
small, the value of E[y*(n)] may be significantly affected by the side lobes of F(e/”) 
as these side lobes, although small, are multiplied by some large values of ®,, (e/) 
before integration. In the context of orthogonal transforms and signal power spreading, 
the minimization of the side lobes of F(e/”) in the latter case to reduce the value of 
E [y?n] is very critical. Referring back to the DCT filters and the size of their associated 
side lobes, we find that the DCT has the necessary properties to be effective in achieving 
a close to maximum signal power spreading when applied to any lowpass signal. 

To get further insight on the above results, we consider two more examples. We con- 
sider two choices of the inputs, x; (n) and x,(n), that are generated by passing a unit 
variance white noise process through two coloring filters, which are specified by the 
system functions 


H,(z) = 0.1 + 0.227! + 0.3277 + 0.4273 + 0.4274 + 0.2775 + 0.1276 


and 
H,(z) = 0.1 — 0.227! — 0.327? + 0.477? + 0.4774 — 0.2275 — 0.1276 


respectively. Figures 7.6 and 7.7 show the power spectral densities of x(n) and x,(n). 
We note that x, (n) and x(n) are low- and band-pass processes, respectively. 
We also consider two choices of T: 


1. The DCT matrix whose coefficients are specified by Eq. (7.14). 
2. The discrete sine transform (DST) that is specified by the coefficients 


( 2 j _ kin 
Sy = | —— sin kb ly iV (7.65) 
N+1 N+1 
We expect the DCT to perform well when applied to x(n) as this is a lowpass process. 
Figure 7.8 shows the magnitude responses of the DST filters for N = 8. For the DST, 
we observe that the side lobes of the filters whose passbands belong to high or low 
frequencies are relatively smaller than the side lobes of the filters whose passbands are 
within the midband frequencies. Thus, according to our discussion above, we expect DST 
to perform well when applied to x,(n). 
Table 7.2 shows the results of some numerical calculations that have been performed 
to observe the effect of the two transformations in decorrelating the samples of x,(n) 
and x,(n). These results compare the eigenvalue spread of R and R} of the respective 
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Figure 7.7 Power spectral density of the process x,(n). 
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Figure 7.8 The magnitude responses of the DST filters for N = 8. 


Table 7.2 Comparison of the DST and DCT transformations when 


applied to lowpass process x(n) and bandpass process x,(n). 


Amax/Amin I, 
Process 
R Ree Root DCT DST max 
8 375.35 3.01 14.19 15.12 12.10 15.75 
x(n) 20 781.62 3.52 18.15 43.47 37.55 44.66 
30 945.38 3.81 18.18 67.41 60.06 68.79 
8 50.69 5.97 2.93 3.86 4.75 5.28 
x(n) 20 184.74 11.78 3.49 15.44 17.86 18.84 
30 253.42 12.41 3.82 25.82 29.08 30.32 


processes for three values of filter length, N. Also, to illustrate that the improvement 
factor, /,,, and variation of eigenvalue spread of the respective matrices are tightly related, 
values of 7, are also presented in Table 7.2. As it was predicted, the DCT performs better 
for x; (n), and the DST performs better for x(n). 

Reviewing the above observations, the following guidelines may be drawn for the 


selection of the transformation T: 


In general, transforms whose associated band-partitioning filters have smaller side lobes are 
expected to perform better than those with larger side lobes. When an estimate of the power 
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spectral density, ®,,.(e/”), of the underlying input process, x(n), is known, selection of those 
transforms whose associated filters have smaller side lobes within the frequency bands where 
® „œ (e7®) is large, leads to a more significant performance improvement. 


7.7 Transforms 


Although, in general, there are infinite possible choices of the transformation matrix T, 
only a few transforms have been widely used in practice. The main feature of such 
transforms is that there are many fast algorithms for their efficient implementation. They 
also exhibit a good signal separation, that is, from the band-partitioning point of view, they 
all offer well-behaved sets of parallel FIR filters with approximately mutually exclusive 
passbands. In the application of TDAFs, the most commonly used transforms are: 


1. DFT. The DFT is the most widely used transform in various applications of signal 
processing. The k/th element of the DFT transformation matrix, T ppr, is 


1 , 
fa = se P™lN, — forO<k,l<N-1 (7.66) 
VN 


The factor Wt on the right-hand side of Eq. (7.66) is to normalize the DFT coefficients 


so that ZprrZ brr = L 
The distinct feature of DFT, compared to other transforms, is that it distinguishes 
between positive and negative frequencies. This, among all the widely used trans- 
forms, makes DFT the most effective transform in cases where the underlying input 
process has a nonsymmetrical power spectral density with respect to w = 0, that is, for 
complex-valued inputs. If the input is real-valued, then DFT has no advantage over 
the other transforms, In fact, its complex-valued coefficients add some unnecessary 
redundancy to the transformed signal samples, which increases the complexity of the 
system. 
2. Real DFT (RDFT). When N is even, the coefficients of RDFT are given by 
Tr k=0,0</1<N-1 
J cos 2H, l<k<5N-1,0<1<N-1 


fe = (7.67) 


a), k=3N,0<I1<N-1 
(3 sin 22H, sN+1<k<N-1,0<1<N-1 


3. Discrete Hartley transform (DHT). The DHT coefficients are defined as 


hu = Se (cos A aii at). forO<k,l<N-1 (7.68) 
JN N N 

Both RDFT and DHT may be viewed as derivatives of the DFT, which for real- 
valued signals exploit the redundancy of the transformed samples and suggest a lower 
complexity implementation of TDAFs. Experiments on TDAFs with DFT, RDFT, 
and DHT show that they all perform the same when the underlying input process is 
real-valued. 
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4. DCT. There are a few variations of DCT (Ersoy, 1997). However, the most widely 
used DCT is the one defined in Eq. (7.14). 

5. DST. Similar to DCT, there are also a few variations of DST (Ersoy, 1997). However, 
the most widely used DST is the one defined in Eq. (7.65). 

6. Walsh—Hadamard transform (WHT). The WHT is defined when the transformation 
length, N, is a power of 2. The WHT coefficients are 


m—1 


wy = = | | CD OO,  for0<k, I< N-1 (7.69) 


where m = log ,N, and b,(k) is the pth bit (with p = 0 referring to the least significant 
bit) of the binary representation of k. 

The main characteristic of the WHT is its simplicity as all of its coefficients are 
+1 or —1 and, as a result, its implementation does not involve any multiplication. We 
note that in the implementation of TDLMS algorithm, the common coefficient 1/ VN, 
which is just a normalization factor, can be dropped as the step-size normalization 
of the TDLMS algorithm takes care of signal normalization. The price paid for this 
simplicity of the WHT is its higher side lobes compared to other transforms. This, 
of course, results in poorer performance of the WHT when applied to TDAFs, in 
general. 


7.8 Sliding Transforms 


The conventional fast algorithms available for the implementation of the transforms intro- 
duced in the previous section require O(N log N) operations (additions, subtractions, or 
multiplications), where O (-) denotes order of and the term order of x means a value pro- 
portional to x with a fixed proportionality constant. In the context of transversal filters 
and their corresponding transform domain implementation, there is an important property 
of the filter tap-input vector, x(n), that can be used to reduce the complexity of the latter 
transforms further. Namely, when x(n) = [x(n) x(n — 1) --- x —-N+ 1)]", x(n) and 
x(n + 1) have N — 1 elements in common. x(n + 1) is obtained from x(n) by shifting 
(sliding) the elements of x(n) one element down, dropping out the last element of x(n), 
and adding the new sample of input, x(n + 1), as the first element of x(n + 1). In this 
section, we exploit this data redundancy in the successive tap-input vectors x(n) and 
x(n + 1) and introduce two O(N) complexity schemes for efficient implementation of 
the transformation part of TDAFs. These are called sliding transforms. 


7.8.1 Frequency Sampling Filters 


A useful common property of the transforms which were introduced in the last section 
(with the exception of the WHT) is that the transfer functions of their corresponding FIR 
filters can be written in a compact recursive form. These transfer functions can then be 
used for efficient implementation of the respective transforms. To clarify this, we consider 
the DFT filters as an example. 
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The transfer function of the kth DFT filter is 
N-1 
Hir) = X faz” (7.70) 
1=0 


The superscript n in Eq. (7.70) emphasizes that the coefficients fys have been normalized 
so that PAg | fyl? = 1. 
Substituting Eq. (7.66) in Eq. (7.70), we obtain 


N-1 
1 f 
HER) ae X eTJ2Tkl/N 2-1 
vN 1=0 


1 1— (@ J2Rk/N z=1)N 
JN 1 —e7J27k/Nz-1 
ol Lag 
SN 1 — e-i2tk/N ZI 
When the TDLMS algorithm is used to adapt a DFT-based TDAF, the constant factor 
1/./N may be dropped from the right-hand side of Eq. (7.71) as signal normalization 
is taken care of by the step-normalization in TDLMS algorithm, as discussed in the 


earlier sections. Thus, the (unnormalized) transfer function of the kth DFT filter may be 
defined as 


(7.71) 


t=¢ 


k = 
Aber (Z) = I eo s2kx/N ZT (7.72) 


The transfer functions associated with other transforms can also be derived in a similar 
way. For transforms with real-valued coefficients, one has to start with expanding the sine 
and cosine coefficients in terms of their associated complex exponents, then proceed as 
in the case of DFT filters and pack the results. At the end, any fixed scale factor in front 
of the final results is dropped. 

Table 7.3 gives a summary of the transfer functions that are associated with various 
transforms. We have not included WHT here as its transfer functions do not have any 
closed-form equivalent. Hence, a different approach has to be adopted to arrive at an 
efficient implementation of the WHT. This is discussed in Problem P7.19. 

The term frequency sampling filter is used to refer to the filters defined by the transfer 
functions given in Table 7.3. This is because each transfer function corresponds to a 
narrow-band filter which samples a small band of the spectrum of the underlying input 
process. 

Table 7.3 provides all the necessary information for the development of the two real- 
izations of the sliding transforms which are categorized as recursive and nonrecursive 
structures, and are discussed below. 


7.8.2 Recursive Realization of Sliding Transforms 


A direct realization of the transfer functions given in Table 7.3 suggests a simple recursive 
scheme for the implementation of the associated transforms. As an example, we present 
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Table 7.3 Transfer functions associated with the various 
transforms (frequency sampling filters). 
k 1-z-N 
Horr) = Ten; 
1-z7" 
l-z-!? 
(1—cos ket) a-z) 


1—2 cos Zek =l +272 


fork =0 


forl<k<35N-1 


k 
Appr (2) = jN 


1+z7! ’ 
z711) 


1—2 cos 2k 7142-2 i 


(1-(cos 2k _ sin 2k) ) a- 


1—2 cos Zak =l +272 


1 
fork = 5N 


for SN+1<k<N-1 


Hinr?) = 


= a-(—pke7%) 
HK (2) = =z A 
DCT 1—2 cos 4271427? 


À 1+ kz FD 
Hfst) = o 


hd MEFO -11 2 
1 2cos NFT? +z 


here a recursive realization of the DCT filters. Recursive realization of the other transforms 
which follow the same concept is then straightforward. 
From Table 7.3, we have 


d—z yd — (—1)z7™) 


7.73 
1 —2cos ak rol +z? — 


Hicr) = 


This is the transfer function of the kth DCT filter. Figure 7.9 depicts a detailed realization 
of Eq. (7.73). In this realization, we have purposefully divided the transfer function of 
Heat) into three separate parts. Namely, the forward parts, 1 — z~! and 1 — (—1)£z~%, 
and the feedback part, -zzz 7. Welt This separation facilitates the integration of the 
DCT filters (for k = 0, 1, ..., N — 1) in a parallel structure. 

Figure 7.10 depicts a block diagram of the DCT frequency sampling filters when they 
are put together in a parallel structure. Points to be noted here are: 


1. For k = 0, 
1 1 


1 —2cos Zz-1 42-2 (1—z!)? 


Substituting this result in Eq. (7.73), we obtain 
_ —N 


n 7.14 
1—z7! pe 


Hpcr(2) = 


This has been considered in the block diagram of Figure 7.10 and, thus, the case k = 0 
has been treated separately. 
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1 kr 
—— cos — 
UN °° N 
x(n) | zDCT, k(n) 
gt 
+ for k odd 
— for k even 
u Bnn 
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a ~ 
1 


1 — 2 cos Krol sg 2 


Figure 7.9 A realization of Hion (z): 


2. We also note that 


1—(-1)§2-¥ = 1—z%, fork even 


~ )1427%, fork odd 


Thus, the cases of k even and odd are separated at the first stage of Figure 7.10. 
However, when implementing the structure of Figure 7.10, one should note that the 
blocks 1 — z~% and 1 +z7™™ have the same common input and, thus, can share the 
same delay line to hold the past samples of the input. This reduces the memory 
requirement of the system. 


A common problem with the recursive realization of the frequency sampling filters 
that needs careful attention is that these filters are only marginally stable. They can 
easily run into instability problems, unless some special care is taken to ensure stability. 
This is because the poles of the frequency sampling filters are all on the unit circle 
and, as a result, any round-off error will accumulate and grow unbounded. Furthermore, 
quantization of the filter coefficients may result in poles outside the unit circle and thus 
result in unstable filters. 

The above problem can be alleviated by replacing z~! with Bz~!, where £ is a constant 
smaller than, but close to, 1. This shifts all the poles and zeros of the frequency sampling 
filters, which are ideally on the unit circle to a circle with radius 6 < 1. This stabilizes 
the filters at the cost of some additional complexity in their realization as addition of 6 
changes some of the filter coefficients which otherwise would have been unity. 
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Figure 7.10 A parallel realization of the recursive DCT frequency sampling filters. 


7.8.3 Nonrecursive Realization of Sliding Transforms 


The nonrecursive sliding transforms, which are introduced in this section, use the follow- 
ing common property of the frequency sampling filters: 


The frequency sampling filters associated with each transform have a common set of zeros 
out of which each filter selects N — 1 


Bruun (1978) noted the significance of the above property in the case of DFT and used 
that to develop a fast Fourier transform (FFT) structure. Farhang-Boroujeny et al. (1996) 
noted that a rearrangement of the Bruun’s algorithm leads to a sliding DFT structure 
and extended the concept to the other transforms. In the rest of this section, we present 
the sliding transforms that have been proposed in (Farhang-Boroujeny et al. 1996) and 
demonstrate their efficiency in the implementation of TDAFs. 
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Bruun’s Algorithm as Sliding DFT 


The transfer functions of the DFT frequency sampling filters are (from Table 7.3): 


l= 


k — = 
Hper (2) = e irr fork = 0, 1,...,N— 1 (7.75) 


We note that the zeros of these filters are all taken from the set of Nth roots of unity, 
that is, e/?7"/" | for k = 0, 1,..., N — 1. We also note that each DFT filter has one pole 
that belongs to the same set. As a result, we find that a pole-zero cancellation occurs and, 
thus, each DFT filter has effectively N — 1 zeros out of the set of Nth roots of unity and 
no pole. 

Bruun used this simple concept and suggested an elegant factorization of 1 — z~’Y and 
used these results to form a tree structure, as shown in Figure 7.11 (for N = 16), to realize 
the various FIR frequency sampling filters of DFT. The following identities are used for 
the factorization of 1 — z™™: 


N 


lao" =o iz) (7.76) 
and 
ipar a ala taar 62 dd Her Se, AT 


These factorizations, which are used until the last stage of the tree structure, have the 
following two features: 


1. Each factor consists of either two or three sparse taps. 
2. There is at most one nontrivial real-valued coefficient in each factor. 


To see how the above identities could be used to develop the tree structure 
of Figure 7.11, we note that the factors which appear in the first stage are those of 
1 — z7! = (1 — z78)(1 +z). The branches that follow after the factor 1 + z~® are made 
of the factors of the other branch of the first stage, that is, 1 — z~® = (1 — z~*)(1 + z74). 
Similarly, the branches that follow the factor 1—z~* are made of the factors of 
1428 = (1 + V2z7% 42-41 — V2z-? +z“). The same procedure is used to deter- 
mine the other branches of the structure. At the end of the third stage (in our particular 
example), each path of the tree covers 14 out of the 16 zeros of 1 — z~!°. The remaining 
two zeros that have not been covered by each path are complex conjugates, except for 
the top path whose corresponding missing zeros are z = +1. One out of the two missing 
zeros is, then, added at the last stage. The same procedure can be used to develop the 
same structure for any value of N (the transform length), which is a power of 2. 

Bruun (1978) elaborated on the tree structure of Figure 7.11 and proposed his FFT 
structure. In the context of the TDLMS algorithm, we are interested in an efficient imple- 
mentation of the DFT frequency sampling filters and updating their outputs after the 
arrival of every new data sample. The tree structure of Figure 7.11 is exactly what we 
are looking for. Thus, we hold on to this structure as an efficient way of implementing 
the nonrecursive sliding DFT filters. 
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Figure 7.11 Nonrecursive sliding DFT: N = 16 (Bruun, 1978). 


To appreciate the efficiency of the structure given in Figure 7.11, we shall elaborate on 
it further. We note that the pair of filters that originate from a common node at any stage 
share the same coefficients and, thus, they can be implemented jointly, as depicted in 
Figure 7.12. For a real-valued sequence, this implementation requires only one multipli- 
cation and three additions. For a complex-valued input, the number of operations is twice 
this figure. We may also note that each filter pair at the output stage in Figure 7.11 uses 
a pair of complex conjugate coefficients, and therefore, the corresponding multiplications 
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input 


output 


Figure 7.12 An implementation of the filter pair 1 + cz~* + z~**, when they share a common 
input. 


input 


output 


A 
> > 


Figure 7.13 An implementation of the filter pair (1 — cz~!, 1 — c*z7!), when they share a com- 
mon input. 


can be shared. Figure 7.13 depicts a joint implementation of a filter pair of the output 
stage of Figure 7.11. In this implementation, cp and cy denote the real and imaginary parts 
of c, respectively, where c and c* are the pair of filter coefficients. For a complex-valued 
input, this implementation requires four real multiplications and six real additions. 


Real-Valued Transforms 


As was noted before, when the filter input is real-valued, about 50% of the DFT outputs 
are redundant as they appear in complex conjugate pairs. In such situations, transforms 
with real-valued coefficients are preferred. Following Bruun’s factorization technique, it 
is not difficult to come up with tree structures similar to ones in Figure 7.11 for other 
transforms. Figures 7.14—7.17 show a set of such tree structures for nonrecursive sliding 
RDFT, DHT, DCT, and DST, respectively. Note that for the examples shown, value 
of N is 16 for RDFT, DHT, and DCT, and 15 in the case of DST. Further details on 
these structures, along with some efficient programming techniques for their software 
implementations, can be found in (Farhang-Boroujeny et al. 1996). 
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Figure 7.14 Nonrecursive sliding RDFT: N = 16 (Farhang-Boroujeny et al. 1996). 


7.8.4 Comparison of Recursive and Nonrecursive Sliding Transforms 


In terms of robustness to numerical round-off errors, the nonrecursive sliding transforms 
are superior to their recursive counterparts. A simple inspection of the nonrecursive sliding 
structures shows that each output in these structures is calculated based on a very limited 
number of multiplications and additions. Furthermore, there is no feedback of numeri- 
cal errors, thereby avoiding error accumulation. This property, which is inherent to all 
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Figure 7.15 Nonrecursive sliding DHT: N = 16 (Farhang-Boroujeny et al. 1996). 


FFT-like structures, results in very low sensitivity to finite wordlength effects (Rabiner 
and Gold, 1975; Oppenheim and Schafer, 1975). On the contrary, the recursive sliding 
transforms are highly sensitive to numerical error accumulation, because of the feedback. 
The variances of such errors are proportional to oe where £ is the stabilizing factor 
as defined before. Noting that 6 has to be salen close to | so that the deviation of 
the realized filters from the ideal frequency sampling filters would be minimum, these 


variances can be excessively large. 
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Figure 7.16 Nonrecursive sliding DCT: N = 16 (Farhang-Boroujeny et al. 1996). 
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Figure 7.17 Nonrecursive sliding DST: N = 15 (Farhang-Boroujeny et al., 1996). 
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Table 7.4 Computation counts of the nonrecursive and recursive 
sliding transforms 


Nonrecursive Recursive 


Mults Adds/Subs Mults Adds/Subs 


DFT 3N—2m—8 6N—2m—8 4N —6 4N —6 


RDFT N-m-2 2N-m-2 Ss 25 
DHT ¥-m-6 *%-m-4 3N-7 3N-7 
DST N-m 3N —m—3 2N 2N+1 


DCT N-m-1 3N —5 2N+1 2N +2 


m = log, N for DFT, RDFT, DHT, and DCT 
m = log,(N + 1) for DST 


In terms of the number of operations per input sample, also the nonrecursive sliding 
transforms are found to be superior to their recursive counterparts. Table 7.4 gives the 
details of the operation counts of the two schemes. For the case of recursive implemen- 
tations, the figures given in Table 7.4 have taken into account the effect of the stabilizing 
factor 6. 

The major drawback of the nonrecursive sliding transforms is that they are limited to 
the cases where the filter length, N, (filter length plus one in the case of DST) is a power 
of 2. On the contrary, the recursive sliding transforms can be used for any value of N. 


7.9 Summary and Discussion 


In this chapter, we reviewed a class of adaptive filters known as TDAFs. We gave a 
filtering interpretation of orthogonal transforms and demonstrated that a transformation 
may be viewed as a bank of bandpass filters, which are used to separate different parts 
of the spectrum of the underlying input process. This led to a band-partitioning view of 
orthogonal transforms. It was thus concluded that the outputs from an orthogonal trans- 
formation constitute a set of partially decorrelated processes as they belong to (partially) 
mutually exclusive bands. 

Implementation of the LMS algorithm in transform domain was then presented. This 
was called transform domain LMS (TDLMS) algorithm. It was shown that significant 
improvement in convergence behavior of the TDLMS algorithm can be achieved if 
a proper set of normalized step-size parameters is used. This, which was called step- 
normalization, is assumed to be part of the TDLMS algorithm. 

We showed that the TDLMS algorithm could equivalently be reformulated by normal- 
izing the transformed samples of the underlying input process to the power of unity and 
then using the conventional LMS algorithm (with a single step-size parameter for all taps) 
to adapt the filter tap weights. This formulation is theoretically of interest as it allows 
one to use the results of the conventional LMS algorithm in evaluating the performance 
of the TDLMS algorithm. 
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The ideal LMS—Newton algorithm was introduced as a stochastic implementation of 
the Newton search method of Chapter 5. The relationship between the TDLMS and ideal 
LMS-—Newton algorithms was also established. We found that the TDLMS algorithm is 
in fact an approximation to the ideal LMS—Newton algorithm. 

We noted that for a given input process, the success of different transforms in decor- 
relating the samples of an input process varies. We presented a theory that relates the 
signal decorrelation property of orthogonal transforms to the distribution of signal powers 
after transformation. We demonstrated how this concept is related to the Karhunen-Loéve 
transform and drew some general guidelines for the selection of an appropriate transform 
when a rough estimate of the power spectral density of the underlying input process 
is known. 

We also introduced various standard transforms that can be implemented efficiently 
using fast transforms. The siding fast implementation of these transforms was then pre- 
sented. We found that in the application of TDAFs, the commonly used transforms can all 
be implemented with an order of N computational complexity, where N is the filter length. 


Problems 


P7.1 Figure P7.1 shows the power spectral densities of four processes and the magni- 
tude responses of their associated eigenfilters for N = 5, in some arbitrary order. 
Considering the maximum signal-power-spreading property of the KLT, identify 
the magnitude response associated with each power spectral density. 


P7.2 By substituting for past values of oe. (n) in Eq. (7.28), show that One (n) is an 
exponentially weighted average of the present and past samples of x. ¡(ny s using 
the weighting function characterized by the coefficients 1, 8, 8?, ..., that is, 


Dino XZ (2 — k) 
Liao BE 


P7.3 Assume a noisy sinusoidal sequence s(n) = asin(@n + @) + v(m), where v(n) 
is an uncorrelated noise sequence. The angular frequency w is known a priori. 
However, the magnitude “a” and phase “ġ” are unknown. To obtain an estimate 
of these parameters, a two-tap transversal filter whose input is chosen to be 
u(n) = sin@n is set up and its tap weights, wọ(n) and w,(n), are adapted so 
that the difference between s(n) and the filter output, y(n), is minimized in the 
mean-square sense. The filter output, y(n), is then a noise-free estimate of the 
sinusoidal sequence. The LMS algorithm is used for this purpose. 


2 
ee (n) = 


(i) Using time averages, find the correlation matrix R of the filter tap inputs. 
(ii) Find the step-size parameter, jz, of the LMS algorithm that results in 5% 
misadjustment. 
(iii) For the step-size parameter obtained in (ii), find the time constants of the 
learning curve of the filter and show that the convergence of the LMS algo- 
rithm becomes slower as w decreases. 
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(iv) Show that the problem of slow convergence of the LMS algorithm can be 
solved if a TDLMS algorithm with the transformation matrix 


1 [i -1 
r= T] 
is used. 


P7.4 An adaptive transversal filter is excited by two different inputs, u(n) and v(n), 
whose power spectral densities are presented in Figure P7.4a and b. 


(i) If the LMS algorithm is used in both cases and its step-size parameter is 
selected accordingly for a fixed level of misadjustment (say, 10%), which 
of the two inputs will result in the shortest transient time for the algorithm? 
Explain. 

(ii) What will be your answer to (i), if a DCT-based transform domain imple- 
mentation of the adaptive filter is employed? 

(iii) Will your answer to (ii) change, if the DCT is replaced by DST? 


0 0.1 0.2 0.3 0.4 0.5 0 0.1 


o2 08 04 O05 
Frequency, f 


Frequency, f 
(a) (b) 


Figure P7.4 


P7.5 Figure P7.5 shows the structure of a special adaptive filter, whose tap inputs are 
the samples of the processes u(n) and v(n), which are generated from a stationary 
input process x(n) as shown. Assume that the filter length, N, is an even number. 


(i) Define the length N column vector 
x(n) = [u(n) v(n) u(n — 2) Vin — 2) «+» un-N+2)v0n—N+ 2)]T 


and show that 


x(n) = T x(n) 
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where 
1 100 0-:-- 0 0 
1-10 0 0-:-- 0 0 
001 1 0-:-. 0 0 
0--. 0 0 


T,= 0 0 1-1 


00000.. 1 1 
000 0 0-::- 1-1 

(ii) Show that 7, is an orthogonal matrix and, thus, conclude that the structure 
presented in Figure P7.5 corresponds to a TDAF with T = T). 

(iii) You may note that T,J'= 21. This is different from the unitary condition 
TTT =I, which is usually assumed for the transformation matrix T. Does 
this deviation affect the performance of the TDLMS algorithm? 

(iv) If the TDLMS algorithm (with the step-normalization) is to be used for fast 
adaptation of this structure, give the details of equations required for such 
implementation. 

(v) Compare the structure of Figure P7.5 with that of a conventional LMS- 
based transversal adaptive filter both in terms of computational complexity 
and memory requirement. 


u(n) = u(n — 2) = u(n — N +2) 
vu(n- N + 2) 
wo Tg e-a LI 


Qe wr,n—1(n) 


WT,N-2 (n) > 


Figure P7.5 
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P7.6 


P7.7 


P7.8 


P7.9 


Generalization of the adaptive filter structure given in Problem P7.5 may be done 
as follows. Define the N-by-N matrix 


Ta 0 0- 0 
6 T0 0 


0 0 0- Top 
where Z ub is an orthogonal square matrix, and 0’s are zero matrices of appro- 
priate dimensions. 


(i) Show that T is an orthogonal matrix. 

(ii) Considering the analogy between the transformation matrix 7 here and the 
one in Problem P7.5, construct a generalized version of Figure P7.5. 

(iii) Noting that, in general, larger matrices achieve a higher degree of signal 
decorrelation (orthogonalization), discuss on the convergence behavior of 
the proposed structure as the size of T,,, increases. 

(iv) Discuss on the memory requirement and computational complexity of the 
proposed structure as the size of 7 w increases. 


Show that the identity Z 7T = I implies that the eigenvalues of R and Ry = 
TRT" are the same. Thus, conclude that p(R) = p(R7). 


With reference to the notations in Section 7.6, show that 


I, max — 2r = In p(R2). 


p,max 


Consider a two-tap transversal filter that is characterized by the performance 
function 


E(vp, v1) = 0.1 + [vo JR ki 


a 


Assume that the filter input is a real-valued random process. 


where 


(i) Find the points (a,0) and (0, b) where the contour ellipse, given by 
&(vo, vı) = 1.1, meets vg and v; axes and show that a = b. Show that this 
result is directly related to the fact that the diagonal elements of R are the 
same which, in turn, implies that the signal energies at various taps of the 
filter are equal. 

(ii) By sketching an arbitrary ellipse that passes through the points (a, 0) and 
(0, b) of (i), verify that the principal axes of the sketched ellipse are always 
in the directions obtained by 45° rotation of the coordinate axes vg and v. 


> This problem has been designed based on the work of Petraglia and Mitra (1993). 
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P7.10 


P7.11 


P7.12 
P7.13 
P7.14 
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(iii) Define an orthogonal transformation matrix 


T= e — a) 


sin cosé 


and show that the transformation vy = Tv, where v = [vo v; ]7, is equivalent 
to rotating the coordinate axes vg and vı by 0 radian counterclockwise. 

(iv) Find the rotation angle 6 that maximizes the ratio of the diagonal elements 
of R7 = TRT” and show that it is independent of æ. 

(v) Noting that the diagonal elements of Ry are the input signal energies after 
transformation, comment on your results in (iv) and show that for two-tap 
transversal filters with real-valued input processes, the optimum transforma- 
tion matrix, T opt: iS fixed and independent of the statistics of the underlying 
input process. What is T opt? 


The autocorrelation matrix R of the input process to an adaptive filter is known. 
To use this information to speed up the adaptation of the filter, the following 
algorithm is proposed.° 


X7(n) = Tx(n) 
y(n) = wy (n)xz(n) 
e(n) = d(n) — y(n) 
we(n + 1) = w7 (n) + 2ue(n)xg (n) 


where 7 = R~!/*, which is the inverse of the square root of R (as defined in 
Chapter 4), and u is a scalar step-size parameter. Note that the matrix T here is 
not an orthogonal matrix, and, thus, the proposed algorithm is different from the 
TDLMS algorithm introduced in this chapter. In particular, we may note that the 
proposed algorithm does not have any step-normalization. 


(i) Obtain the correlation matrix of the transformed samples, x7 (n), and discuss 
on the significance of 7 = R~!/* in increasing the speed of convergence of 
the adaptive filter. 

(ii) Give an approximate equation for the misadjustment of the proposed algo- 
rithm. 

(iii) Define w(x) = R7~'/2w--(n) and use that to show that the proposed algorithm 
is equivalent to the ideal LMS—Newton algorithm. 


In Section 7.8.1, a derivation of the DFT frequency sampling filters was given. 
Following the procedure used there, derive the rest of the system functions listed 
in Table 7.3. 


Derive a sliding DFT structure for the case where N = 8. 
Derive the sliding RDFT structure presented in Figure 7.14. 


Derive the sliding DHT structure presented in Figure 7.15. 


6 This problem has been designed based on the work of Widrow and Walach (1984). 
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P7.15 
P7.16 
P7.17 
P7.18 
P7.19 


Derive the sliding DCT structure presented in Figure 7.16. 
Derive the sliding DST structure presented in Figure 7.17. 
Derive a sliding DCT structure for the case where N = 8. 
Derive a sliding DHT structure for the case where N = 7. 


The transfer functions associated with the WHT cannot be written in a recursive 
form such as those given in Table 7.3 for the other transforms. However, we still 
find that each WHT filter may be implemented as a cascade of log ,N nonre- 
cursive sparse coefficients filters similar to the other transforms. In this problem, 
we clarify this by exploring the WHT for the transformation length N = 8. The 
generalization of the results to any value of N, which is a power of 2, is then 
obvious. 


(i) Use Eq. (7.69) to find the coefficients of the WHT when N = 8. 
(ii) Use the results of (i) to write down the transfer functions associated with 
various rows of the WHT when N = 8. 
(iii) Show that the transfer functions obtained in (11) can be factorized as 


1 
/8 


where the various combinations of the + signs cover all the eight filter 
transfer functions. 

(iv) Using the latter factorization, propose a tree structure, similar to the nonre- 
cursive sliding transforms introduced in Section 7.8.3, for an O(N) imple- 
mentation of the WHT. 


d+z2%0£2%0427) 


Computer-Oriented Problems 


P7.20 


Consider a modeling problem where a plant 
W,(z) = 0.4 + 27! — 0.3277 


is modeled using a 15-tap transversal adaptive filter. The plant is assumed to be 
noise free. The input to the plant and adaptive filter is generated by passing a 
unit-variance white process through the coloring filter 


H(z) = 0.1 — 0.377! — 0.577? +27 424 — 0.5775 — 0.37 °+4+01277 


(i) Write a program to simulate this scenario. In your program after every 10 
iterations, plot the magnitude response of the adaptive filter and observe how 
it converges toward the magnitude response of the plant. 

(ii) Obtain and plot the power spectral density of the adaptive filter input and try 
to relate that to your observation in (i). You should find that the convergence 
of the magnitude response of the adaptive filter toward the plant response 
is frequency dependent. Over the frequency bands where the filter input has 
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P7.21 


P7.22 


P7.23 


P7.24 


P7.25 
P7.26 
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higher power, convergence is faster. On the other hand, the slow modes of 
the adaptive filter correspond to the bands where the filter input is poorly 
excited, that is, having low power spectral density. 


Repeat Problem P7.20 when the LMS algorithm is replaced by a DCT-based 
TDLMS algorithm. Study the performance of the algorithm with and without 
step-normalization. 


Repeat Problem P7.20 when the LMS algorithm is replaced by a DST-based 
TDLMS algorithm. Compare your results here with those of Problem P7.21. 


Repeat Problem P7.20 when the LMS algorithm is replaced by a DCT-based 
TDLMS algorithm and 


H(z) = 0.1 + 0.377! + 0.57? +z? +774 — 0.575 + 0.3776 + 0.1277. 


Repeat Problem P7.20 when the LMS algorithm is replaced by a DST-based 
TDLMS algorithm and 


H(z) = 0.1 + 0.377! +0.57? + 2-3 + 2+ — 0.5775 + 0.3776 + 0.1277 


Compare your results here with those of Problem P7.23. 
Develop and run your own program(s) to confirm the results of Table 7.2. 


Consider a modeling problem where the plant is a 16-tap transversal filter. The 
plant output is contaminated with an additive white noise, e(n), with variance 
o= 1074. The plant input is generated by passing a unit variance white process 
through a coloring filter. Here, we consider the following choices of the noise 
coloring filter: 


H,(z) = 0.1 + 0.227! + 0.377? + 0.477? + 0.4774 + 0.2277 + 0.127, 


H,(z) = 0.1 — 0.227! — 0.327? + 0.477? + 0.4774 — 0.2275 — 0.127, 
and 
H(z) = 0.1 — 0.2z7! + 0.327? — 0.4273 + 0.4774 — 0.2775 + 0.1776 


Note that the first two filters are those which were used in Section 7.6.4 to obtain 
the results of Table 7.2. We also note that the outputs of H,(z) and H(z) are 
lowpass and bandpass processes, respectively (Figures 7.6 and 7.7). The coloring 
filter H(z) generates a highpass process. 

Develop a program (or a set of programs) to study the convergence behavior of 
the TDLMS algorithm for these choices of input and various choices of trans- 
forms. Examine your results and see how consistent are these with the general 
conclusions of Section 7.6. 


8 


Block Implementation of 
Adaptive Filters 


There are certain applications of signal processing that require adaptive filters whose 
length exceeds a few hundreds or even a few thousands of taps. For instance, to prevent 
the return of speaker echo to the far-end side of the telephone line, in the application 
of hand-free telephony, the use of an acoustic echo canceler whose length exceeds a 
few thousand taps is not uncommon. Other applications, such as active noise control 
and equalization of some communication channels, may also require adaptive filters with 
exceedingly long lengths. In such applications, one finds that even the conventional LMS 
algorithm, which is known for its simplicity, is computationally expensive to implement. 

In this chapter, we show how block processing of the data samples can significantly 
reduce the computational complexity of adaptive filters. In block processing (or block 
implementation), a block of samples of the filter input and desired output are collected 
and then processed together to obtain a block of output samples. Thus, the process involves 
serial-to-parallel conversion of the input data, parallel processing of the collected data, and 
parallel-to-serial conversion of the generated output data. This is illustrated in Figure 8.1. 
The computational complexity of the adaptive filter can then be reduced significantly 
through elegant parallel processing of the data samples. We note that the parallel pro- 
cessing involved in Figure 8.1 is repeated only after collection of every block of data 
samples. Thus, a good measure of the computational complexity in a block processing 
system is given by the number of operations required to process one block of data divided 
by the block length. We may then note that the sharing of the processing time among the 
samples in each block is the key to achieve high computational efficiency. 

In this chapter, we discuss an efficient technique for block processing of data samples 
in the adaptive filtering context. This involves a special implementation of the LMS 
algorithm, which is called block LMS (BLMS). We introduce a computationally efficient 
implementation of the BLMS algorithm in the frequency domain. This is called fast 
BLMS (FBLMS) algorithm. The high computational efficiency of the FBLMS algorithm 
is achieved by employing the following result from the theory of digital signal processing 
(DSP). Linear convolution of time domain sequences can be efficiently implemented using 
frequency domain processing. In particular, the linear convolution of an indefinite length 
sequence, x(n), with a finite length sequence, h, (which may be that of the impulse 
response of a FIR filter) is obtained by partitioning x(n) into a set of overlapping finite 
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Figure 8.1 Schematic of a block processing system. 


duration blocks, finding the circular convolution of h, (appended with some extra zeros) 
with these blocks, and then choosing the portions of the circular convolutions, which 
match the desired linear convolution samples. The circular convolutions can be very 
efficiently performed in the frequency domain, using the properties of the discrete Fourier 
transform (DFT). 

Throughout this chapter, we adopt the following notations. As in the previous chapters, 
bold lowercase letters represent vectors, bold uppercase letters denote matrices, and non- 
bold lowercase letters represent scalars. As before, we use “n” as the time (sample) index. 
The letter “k” is reserved for block index. The subscript F is used to refer the frequency 
domain signals, for example, DFT of the time domain vector x is denoted as xp. In 
the derivations that follow, we frequently need to extend the dimensions of vectors and 
matrices to some certain dimensions by appending zeros. We use 0 (in bold) to refer to 
zero vectors and zero matrices and the dimensions of these zero vectors and/or matrices 
will be clear from the context. 

Our discussion in this chapter is limited to the case were the filter input, x(n), and 
the desired output, d(n), are real-valued processes. However, we note that the frequency 
domain equivalent of these processes are complex-valued and hence, the LMS recursion 
that is used is the complex LMS algorithm. 


8.1 Block LMS Algorithm 


The conventional LMS algorithm, which was introduced in Chapter 6, uses the following 
recursion to adjust the tap weights of an adaptive filter: 


w(n + 1) = w(n) + 2ue(n)x(n) (8.1) 


where x(n) = [x(n)x(n — 1) -- -x(n — N +1)]" and w(n) = [wo(n)w,(n)--- wy_,(n)]" 
are the column vectors consisting of the filter tap inputs and tap weights, respectively, 
e(n) = d(n) — y(n) is the output error, d(n) and y(n) = w!(n)x(n) are the desired and 
actual outputs of the filter, respectively, and u is the step-size parameter. We also recall 
that the conventional LMS algorithm is a stochastic implementation of the steepest-descent 
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method using the instantaneous gradient vector 
Vwe(n) = —2e(n)x(n) (8.2) 


The BLMS algorithm works based on the following strategy. The filter tap weights are 
updated once after collection of every block of data samples. The gradient vector used 
to update the filter tap weights is an average of the instantaneous gradient vectors of the 
form (8.2), which are calculated during the current block. Using k to denote the block 
index, the BLMS recursion is obtained as 


cp e(kL + i)x(KL + i) 
L 


where L is the block length and upg is the algorithm step-size parameter. We also note that 
for the computation of the output error samples e(AL + i) = d(kL +i) — y(kL +i), for 
i=0,1,...,L—1, the output samples y(kL + i) = wT(k)x(kL + i) are calculated using 
the update of the filter tap-weight vector, w(k), from the previous block. 
The derivations presented in the following sections make use of, to a large extent, the 
vector formulation of the BLMS algorithm. Hence, we now present this formulation. 
Define the matrix 


w(k + 1) = wk) + 2ug (8.3) 


X(k) = [K(KL) x(KL+ 1) --- x(KL+L—1]" (8.4) 
and the column vectors 
d(k) = [d (kL) d(kL+1) --» dkL+L—1)]' (8.5) 
y(k) = [y(KL) yRL+1) --- ykL+L -D (8.6) 
e(k) = [e(kL) e(kL+1) --- e(kL+ L- 1)]" (8.7) 
and note that 
y(k) = X(k)w(k) (8.8) 
and 
e(k) = d (k) — y(k) (8.9) 
We also note that eh 
XO e(kL + i)x(KL + i) = X"(Ke(k) (8.10) 
i=0 


Substituting Eq. (8.10) in Eq. (8.3), we obtain 
w(k +1) = w(k) + 2 EXT (be(h) (8.11) 


Equations (8.8), (8.9), and (8.11), which correspond to filtering, error estimation, and 

tap-weight vector updating, respectively, define one iteration of the BLMS algorithm. 
On the basis of our background from the method of steepest-descent and, also, the 

conventional LMS algorithm, the following comments may be made, intuitively: 


1. Convergence behavior of the BLMS algorithm is governed by the eigenvalues of the 
correlation matrix R = E[x(n)x!(n)]. This follows from the fact that similar to the 


254 Adaptive Filters 


conventional LMS algorithm, the BLMS algorithm is also a stochastic implementation 
of the steepest-descent method. 
2. The BLMS algorithm has N modes of convergence, which are characterized by the 


time constants: i 


= ——, for i=0,1,...,N-1 (8.12) 
4ugài 


TB,i 


where A;’s are the eigenvalues of the correlation matrix R. These time constants are 
in the unit of iteration (block) interval. 

3. Averaging the instantaneous (stochastic) gradient vectors, as done in the BLMS algo- 
rithm, results in gradient vectors with a lower variance compared to those in the 
conventional LMS algorithm. This allows the use of a larger step-size parameter for 
the BLMS algorithm compared to the conventional LMS algorithm. For block lengths, 
L, comparable or less than the filter length, N, and small misadjustments, in the range 
of 10% or less, misadjustment, Mg, of the BLMS algorithm can be approximated by 
the following expression: 


Mp © E fR] (8.13) 


This result is derived in Appendix 8A. 
Comparing Eq. (8.13) with Eq. (6.64), and letting Mg = M, where M denotes the 
misadjustment of the conventional LMS algorithm, we obtain 


Ug = Lu (8.14) 


where jz is the step-size parameter of the conventional LMS algorithm. Substituting 
Eq. (8.14) in Eq. (8.12), we get 


1 
"Bi = 4L i; 


block interval (8.15) 


= — sample interval 8.16 
Aaa, p (8.16) 
Comparing this result with Eq. (6.33) and recalling that the time constants associated 
with the conventional LMS algorithm are in sample intervals, we conclude that the 
convergence behavior of BLMS and conventional LMS algorithms are the same. 


The following example illustrates the above remarks. 


Example 8.1 


Let us consider the modeling problem discussed in Section 6.4.1 and use the signal 
coloring filter H; (z) of Eq. (6.80) to generate the input process, x(n). Figure 8.2 shows 
the results of simulations that compare the conventional LMS algorithm and the BLMS 
algorithm for different choices of the block length, L. The results presented here are 
based on an ensemble average of 100 independent runs for each plot. The step-size 
parameters u and upg have been selected according to Eqs. (6.64) and (8.13), respectively, 
for 10% misadjustment. We note that the difference between the various learning curves 
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Figure 8.2 Convergence behavior of the BLMS algorithm for various values of the block length, 
L. Results of the conventional LMS algorithm are also shown for comparison. The step-size param- 
eters u and up are selected based on Eqs. (6.64) and (8.13), respectively, for 10% misadjustment. 


in Figure 8.2 is negligible. This confirms the theoretical predictions made above, which 
suggest that the BLMS and conventional LMS algorithms perform the same. 

The program used to generate the results of Figure 8.2 is available on the accompanying 
website. It is called blk_mdlg.m. The reader is encouraged to try this program for 
different choices of misadjustment and block length to study the effect of variations of 
these parameters on the behavior of the BLMS algorithm. 


8.2 Mathematical Background 


The mathematical and signal processing tools required for the rest of this chapter are 
briefly reviewed in this section. In particular, we discuss how time domain linear convo- 
lutions can be efficiently performed using DFT (Oppenheim and Schafer, 1975, 1989). 
We also introduce circular matrices and review some of their properties that are relevant 
to our study of BLMS algorithms. 


8.2.1 Linear Convolution Using the Discrete Fourier Transform 


We consider the filtering of a sequence x(n) through a FIR filter with coefficients 
Wo, W1,---, Wy_y. This involves computation of the linear convolution 


N-1 
y(n) = Yo w;x(n — i) (8.17) 
i=0 
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This process requires N multiplications and N — 1 additions for computing every sample 
of the output, y(n). When N is large, the samples of y(n) can be obtained with a reduced 
number of multiplications and additions, as discussed below. 

Let us define the column vector x(k) of length N’ = N + L — 1 as 


K(k) = [x(kKL—N+1) x(kL-N +2) --- x(kL+ L-1)! (8.18) 
and w(k) of length N’ as 
Šk) = kg (8.19) 


where w(k) = [wo(k) wi (k) --: wy_;(k)]Ë is the filter tap-weight vector, and 0 refers 
to a column vector consisting of L — 1 zeros. In order to maintain uniformity in the 
derivations of the subsequent sections, the block index k has been added to the filter tap 
weights, indicating that the weights vary only from block to block, as it happens in the 
implementation of the BLMS algorithm. 

From the properties of the DFT, we know that the circular convolution of w(k) and 
x(k) can be obtained by transforming both vectors to their respective frequency domain 
equivalents (using the DFT), performing an element-wise multiplication on the trans- 
formed samples and transforming the result back to the time domain (using the inverse 
DFT (IDFT)). This process can be efficiently implemented using the fast Fourier trans- 
form (FFT) and inverse FFT (IFFT) algorithms. Examining the circular convolution of 
w(k) and x(k) reveals that only the last L elements of the result coincide with the cor- 
responding elements of the linear convolution (8.17); see Oppenheim and Schafer (1975) 
for example.'! The rest of the elements of the circular convolution do not provide any 
useful result as the elements of x(k) are wrapped around and are not in the right order, 
as required by the linear convolution (8.17). The computation of the circular convolution 
of w(k) and x(k) and the wraparound phenomenon are summarized as 


* PxkL—-N+1) x(kL+L-1) x(kL+L-2) +) x(kL-N+2)] f wk) 
* x(KL—N+2) x(kL-N+1) x(kL+L-1) =- x(kKL-N+3)|| w% 
5 o xL- 1) x(kL —2) x&L-3) ~. x(kL) wy) 
(KL) E x(kL) x(kL — 1) x(kL-2) = x(kL+1) wy) 
y(kL + 1) x(kL + 1) x(kL) x(kL-1) os x(KL+2) 0 
| y@L4+L—1) | x(kL-+ L — 1) eG + b=) x(kL+L-3) i x(kL-N+1) L 0 
(8.20) 


In Eq. (8.20), the elements represented by asterisks correspond to circular convolu- 
tion results, which do not coincide with linear convolution samples, as required by 
Eq. (8.17). Careful examination of the summations related to these elements reveals that 
the input samples experience some discontinuity in their order. For example, a jump from 


l In the original derivation of the FBLMS algorithm by Ferrara and Widrow (1981), and most of the subsequent 
publications on this, the block length, L, is chosen equal to the filter length, N. Also, the column vector 0 in 
Eq. (8.19) has been assumed to be of length L = N, and not L — 1, as we assume here. 
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x(kL — N + 1) to x(kL+ L — 1) is observed in the first row of the data matrix on the 
right-hand side of Eq. (8.20). When such discontinuities overlap with the nonzero por- 
tion of w(x), then the corresponding output samples will not correspond to valid linear 
convolution samples. 

The procedure explained by Eq. (8.20) is commonly known as the overlap-save method. 
This name reflects the fact that in each block of input, x(k) consists of L new samples and 
N — 1 overlapped samples from the previous block(s). Another equally efficient method 
for computation of linear convolutions using DFT is the overlap-add method. However, 
the overlap-add method has been found to be computationally less efficient than the 
overlap-save method when applied to the implementation of the BLMS algorithm. Noting 
this, we do not discuss the overlap-add method in this book. 


8.2.2 Circular Matrices 


Circular matrices are used extensively in the derivation and analysis of the FBLMS 

algorithm. Hence, it is very useful as well as necessary to have a good understanding of 

the properties of these matrices before we start our discussion on the FBLMS algorithm. 
Consider the M-by-M circular matrix 


49 Gy) 4y2°°° A 
ay 49 Gy-1*"* 4 
A=] : Dott (8.21) 
ay—2 4y—3 4M—-4 ``’ 4y-1 
Gy-1 4y—2 4y-3 `° 4% 


Clearly, the name “circular” refers to the fact that each row (column) of A, is obtained 
by circularly shifting the previous row (column) by one element. A special property of 
circular matrices, which is extensively used in the following sections, is that such matrices 
are diagonalized by DFT matrices. That is, if F is the M-by-M DFT matrix defined as 


1 1 1 1 
1 ae ie 2n(M—1) 
1 elt ae an Ce 
F= (8.22) 
. 2n(M—1) 4a(M—1) 2n(M—1)2 
le e M 
then 
Ar = FA,F! (8.23) 


is a diagonal matrix. Furthermore, the diagonal elements of Ap correspond to the DFT 
of the first column of A,. In matrix notation, this may be written as 


A, = diag[a +] (8.24) 


where ap = Fa, a = [dy a, --- dy_,]' is the first column of A,, and diag[a;] denotes 
the diagonal matrix consisting of the elements of ap. This can be proved as follows. 
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Since F is a DFT matrix, recall that 
F = — F" (8.25) 


where M is the length of DFT and asterisk denotes complex conjugation. In other words, 
the /th column of F—! can be given as 


1 x 
g = uM" (8.26) 


where f, is the /th column of the DFT matrix F given by 


fa [1e i te ge (8.27) 
Next, by direct insertion, one can easily show that (Problem P8.2) 
A.g) = AF181» for l= 0, 1, ...3 M-1 (8.28) 


where ap ; = yar aei H is the /th element of a. Using Eq. (8.24), the M equations 
in (8.28) may be put together to obtain 


AF! = F'Ap (8.29) 


Premultiplying Eq. (8.29) on both sides by F gives Eq. (8.23). 

Another important result of the circular matrices that will be useful for our later appli- 
cation is derived next. Applying Hermitian transposition on both sides of Eq. (8.23), we 
obtain 

A = FHA PH (8.30) 


where F—" is the shorthand notation for (F—')4. Since Ag is diagonal, AH = A3. 
Furthermore, from Eqs. (8.22) and (8.25), F! = 4F and FY = MF! as F =F. 
Using these in Eq. (8.30), we get 


ye th (8.31) 
When elements of A, are real-valued, A! = AT and, thus, Eq. (8.31) may be written as 


As =F AIF! (8.32) 


8.2.3 Window Matrices and Matrix Formulation of the 
Overlap-Save Method 


Let us define the N’-by-N’ circular matrix, for N’ = L + N — 1, as 
x(kL-N +1) x(KL+L—1) x(KL+L—2) «+» x(kL-N +2) 


x(kL=-N +2) x(kL-N+1) ¢QLEL=1) os x(kL-N +23) 
X.(&%&) = . 7 ; (8.33) 
(04 ba aa ne SS 


We note that this is nothing but the data matrix on the right side of Eq. (8.20). 
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We also define the length N’ column vector 


y(k) = EA (8.34) 


where y(k), as defined in Eq. (8.6), is the column vector consisting of the output samples 
of kth block, and 0 is the length N — 1 zero vector. Let us denote the column vector by 
y,.(k), which appears on the left-hand side of Eq. (8.20), and note that f(k) can be obtained 
from y,(k) by substituting all the x elements in the latter with zeros. This substitution 
can be written in the form of a matrix—vector product as 


y(k) = Po ry-(k) (8.35) 


where Py ;, is the N’-by-N’ windowing matrix defined as 


00 
Po, = f E] (8.36) 


with I, being the L-by-L identity matrix, and 0’s are zero matrices with appropriate 
dimensions. Using Eqs. (8.20), (8.19), and the above definitions, we obtain 


y(k) = Po LX. (w(K) (8.37) 


Implementation of Eq. (8.37) in the frequency domain can now be obtained by simply 
noting that Eq. (8.37) may be written as 


IE) = Py LF FX. (WF F wk) (8.38) 
where F is the N’-by-N’ DFT matrix. Next, define 
wrk) = Fwk) (8.39) 


and 
X p(k) = FX (k) F! (8.40) 


and note that ¥+(k) is the diagonal matrix consisting of the elements of the DFT of the 
first column of X,.(k) as the latter is a circular matrix. We also note that the first column 
of X,(k) is the input vector x(k), as defined in Eq. (8.18). Using Eqs. (8.39) and (8.40) 
in Eq. (8.38), we obtain 

Fk) = PoF ' Xp (k)welk) (8.41) 


This equation has the following interpretation. Since Æ +(k) is diagonal, X +(k)w+(k) is 
nothing but the element-wise multiplication of the filter input and its coefficients in the 
frequency domain. This gives the output samples of the filter in the frequency domain. 
Premultiplication of this result by F~! converts the frequency domain samples of the 
output to the time domain. Furthermore, premultiplying the result by the windowing 
matrix Po, z; results in selecting only those samples that coincide with the required linear 
convolution samples. 

With the background developed in this section, we are now ready to proceed with the 
derivation and analysis of the FBLMS algorithm. 
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8.3 The FBLMS Algorithm 


The FBLMS algorithm, as mentioned in the introduction, is nothing but a fast (numerically 
efficient) implementation of the BLMS algorithm in the frequency domain. 

Equation (8.41) corresponds to the filtering part of the FBLMS algorithm. Element-by- 
element multiplication of the frequency domain samples of the input and filter coefficients 
is followed by an IDFT and a proper windowing of the result to obtain the output vector 
y(k), in the extended form, as defined by Eq. (8.34). The vector of desired outputs, in the 
extended form, is defined as 


d(k) = Fal (8.42) 


where d(k) is defined by Eq. (8.5), and 0 is the N — 1 element zero column vector. We 
also define the extended error vector 


@(k) = d(k) — (k) (8.43) 


To obtain the frequency domain equivalent of the recursion (8.11), we replace w(k) 
and e(k) by their extended versions and note that Eq. (8.11) may also be written as 


Wk +1) = Wk) + 2uPy 9X2 (KEK) (8.44) 


where X,(k) is the circular matrix of samples of the filter input as defined by Eq. (8.33), 
H= 4g/L, and 


Pyo = E 4 (8.45) 


is a N’-by-N’ windowing matrix that ensures that the last L — 1 elements of the updated 
weight vector w(k + 1) remain equal to zero after each iteration of Eq. (8.44). The fact 
that Eqs. (8.11) and (8.44) are equivalent can easily be shown by substituting for the 
vectors and matrices in Eq. (8.44) and expanding the result (Problem P8.4). 

Conversion of the recursion (8.44) to its frequency domain equivalent can be done by 
premultiplying that on both sides by the DFT matrix F and using the identity F—'F = I 
to obtain 

welk +1) = wrk) + 2uFPy oF FXI (OF FER) (8.46) 


Using Eq. (8.40) and the identity (8.32), Eq. (8.46) can be written as 
we(k + 1) = wrk) + 2uPy pXe(Kez¢(k) (8.47) 


where e-(k) = Fe(k) and 
Py.o = FPyof | (8.48) 


Equations (8.41), (8.43), and (8.47) are the three steps required to complete each 
iteration of the FBLMS algorithm, namely, filtering, error estimation, and tap-weight 
adaptation, respectively. Figure 8.3 depicts a block diagram of the FBLMS algorithm, 
which shows how these steps are realized efficiently. The input samples are collected in 
an input buffer whose output is the vector x(k), consisting of L new samples and N — 1 
samples from the previous block(s). The vector X(k) is converted to the frequency domain 
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Figure 8.3 Implementation of the FBLMS algorithm. 


and multiplied by the associated tap-weight vector, w-(k), on an element-wise basis. This 
gives the samples of the filter output in the frequency domain, which are subsequently 
converted to the time domain using an IFFT. The last L samples of this result correspond 
to the output samples of the current block and are sent to the output buffer as well as the 
error estimation section. The error vector, e(n), which consists of L elements, is extended 
to the length of N + L — 1 by appending N — 1 zeros at its beginning and converted to 
the frequency domain using a FFT algorithm. An element-wise multiplication of the error 
and conjugate of the input samples is performed in the frequency domain and the result 
is used to update the filter tap weights. Premultiplication of gradient vector V7(k)ez(k) 
by Py o is necessary to ensure that the last L — 1 elements of the time domain equiva- 
lent of the tap-weight vector w+(k) are constrained to zero (Eq. 8.19). This constraining 
operation is implemented by converting the gradient vector X% (k)ez(k) to time domain, 
making the last L — | elements zero, and converting back to the frequency domain, as 
shown in Figure 8.3. 


8.3.1 Constrained and Unconstrained FBLMS Algorithms 


Mansour and Gray (1982) have shown that under fairly mild conditions, the FBLMS 
algorithm can work well even when the tap-weight constraining matrix Py 9 is dropped 
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from Eq. (8.47). They have shown that when the filter length, N, is chosen sufficiently 
large, and the input process, x(n), does not satisfy some specific (unlikely to happen in 
practice) conditions, the update equation (8.47) and the recursion 


we(k + 1) = welk) + 2u %4 (k)ez(k) (8.49) 


converge to the same set of tap weights. To differentiate between the two cases, Eq. (8.49) 
is called the unconstrained FBLMS recursion, while Eq. (8.47) is referred to as the 
constrained FBLMS recursion. 

The block diagram given in Figure 8.3 is that of the constrained FBLMS algorithm. 
However, it is easily converted to the unconstrained FBLMS algorithm if the gradient- 
constraining operation, enclosed by the dotted line box, is dropped. We may thus note 
that the unconstrained FBLMS algorithm is much simpler to implement as two of the 
five FFTs and IFFTs are deleted from Figure 8.3. As we show in the next section, this 
simplification is at the cost of a higher misadjustment. 


8.3.2 Convergence Behavior of the FBLMS Algorithm 


In this section, we present a convergence analysis of the FBLMS algorithm. We start with 
the unconstrained recursion (8.49). Substituting Eq. (8.41) in Eq. (8.43), we get 


&(k) = d(k) — Py - FX (wrk) (8.50) 


The fact that the first N — 1 elements of d(k) are all-zero implies that d(k) = Po 1d (k). 
Using this in Eq. (8.50), we obtain 


Elk) = Py (dk) — F1 Xp (k)we(k)) 
= Pop F (Falk) — Xp(kywe(k)) (8.51) 
Premultiplying Eq. (8.51) on both sides by F, we get 
er(k) = Py, (de(k) — Xr (k)wg(k)) (8.52) 


where d-(k) = Fd(k) and 
Po. = FPo F! (8.53) 


Substituting Eq. (8.52) in the unconstrained update equation (8.49), we obtain 
we(k + 1) = wk) + 2u X$ (OIP de) — Xr (k)wz(k))] (8.54) 
Next, we define the tap-weight error vector 
velk) = Welk) — W, F (8.55) 


where Ww, ¢ is the optimum value of the filter tap-weight vector in the frequency domain. 
Using Eq. (8.55) in Eq. (8.54), we obtain, after some simple manipulation 


velk +1) = A 2uXEOPy  Xe(k)Velk) + 2u X} (We, (k) (8.56) 


where e, (k) is the optimum error vector obtained when w.-(k) is replaced by w, z- 
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Now, if we use the independence assumption and follow the same procedure as in 
Section 6.2, we will find that the convergence of the unconstrained FBLMS algorithm is 
controlled by the eigenvalues of the matrix 


Rix = ELX} (K) Po, L ¥¢(k)] (8.57) 


The matrix R¥, may be evaluated as follows. Substituting Eq. (8.40) and Eq. (8.53) in 
Eq. (8.57), we obtain 
Rix = FR F ' (8.58) 


where 
RY, = E[X](k)Pp 1 X.0] (8.59) 


A careful examination of R¥, reveals that when L and N are large and the autocorrelation 
function of the input process, x(n), that is, ¢,,(/), approaches zero for the lag values / 
much smaller than L and N, R“, can be approximated by the N’-by-N’ circular matrix 
whose first column is (Lee and Un, 1989) 


Fix =Lx [Øx (0) $a (1) ae Px (1) Aves 0 bx (D $x (I a 1) i $x (DI (8.60) 


Using the properties of the circular matrices, this implies that? 
-27x x0 - 20 « 20(N/—1) 
R" = L x diag (ex (i) bn (e) a, Da Giz )) (8.61) 


where ®,, (e/”) is the power spectral density of the input process, x(n). The samples of 
®,,.(e/”), on the right-hand side of Eq. (8.61), are obtained by taking the DFT of the 
vector ri. /L. 

The fact that R¥, is a diagonal matrix implies that its eigenvalues are equal to its 
diagonal elements. The diagonal elements of R*%,, as specified in Eq. (8.61), in turn, 
are proportional to the samples of the power spectral density of the underlying input 
process. Thus, for colored inputs, as it happens with the conventional LMS algorithm, the 
unconstrained FBLMS algorithm will also perform poorly. The same is also true for the 
constrained FBLMS algorithm as it is nothing but a fast implementation of the BLMS 
algorithm whose convergence behavior was studied in Section 8.1 and found performing 
very similar to the conventional LMS algorithm. 


8.3.3 Step-Normalization 


Convergence performance of the FBLMS algorithm can be greatly improved using indi- 
vidually normalized step-size parameters for each element of the tap-weight vector w-(k) 
rather than a common step-size parameter. This technique, known as step-normalization 
is similar to the one that was described in Chapter 7 for improving the convergence of the 
transform domain LMS (TDLMS) algorithm. It is implemented by replacing the scalar 
step-size parameter u by the diagonal matrix 


u(k) = diag[ugo(k), Wy (k), ... Myr 0] (8.62) 


2 see also Problem P8.11 for an alternative derivation of Eq. (8.61). 
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where u;(k) is the normalized step-size parameters for the ith tap. These are obtained 
according to the equations 


, for i=0,1,...,N’—-1 (8.63) 


_ Mo 
u;(k) = 52 K 


XF 


where u, is a common unnormalized step-size parameter and 62, ,(k)’s are the power 


estimates of the samples of the filter input in the frequency domain, xr ;(k)’s. These 
estimates may be obtained using the following recursion: 


62, (k) = Baz, (k I) + - Bizz; Ol (8.64) 


for i =0,1,...,.N’ — 1, where £ is a constant close to, but smaller than, 1. 


8.3.4 Summary of the FBLMS Algorithm 


Using the results developed in the previous sections, Table 8.1 gives a summary of the 
FBLMS algorithm. This table is in a form that can be readily converted to an efficient 
program code for implementing the FBLMS algorithm. In particular, the diagonal matrix 
X-(k) is replaced by the vector x(k) consisting of the diagonal elements of V(k). 
Also, a(k) is redefined as a column vector. Furthermore, the constraining/windowing 
operations defined by the matrices Py, and Py 9 are reexpressed more explicitly by 
replacing the unwanted elements of the corresponding vectors with zeros. We also use 
the terms FFT and IFFT to refer the DFT and IDFT operations. This is to emphasize 
that, in practice, fast Fourier transform algorithms are used to perform these operations 
efficiently. 

In the derivations given in Section 8.3, it is assumed that the frequency domain tap- 
weight vector w-(k) satisfies the required time domain constraint, viz., the last L — 1 
elements of the IDFT of wz(k) are all zero. Thus, the constraint needs to be imposed 
only on the stochastic gradient vector —247(k)ez(k); see Eq. (8.47). This assumption, 
although theoretically correct if w.-(0) is initialized to a constraint satisfying vector, may 
not continue to be true as the algorithm progresses. This is because the roundoff noise 
that is added to the elements of wz(k + 1) will accumulate and result in a vector that may 
seriously violate the constraint after some iterations. In the case of unconstrained FBLMS 
algorithm, these errors are compensated by the adaptation process, as they propagate back 
to themselves through the unconstrained gradient vector —24¥7(k)e,(k). However, this 
does not happen in the case of constrained FBLMS algorithm, because the gradient vector 
—2X%7(k)ex(k) is constrained before being used for updating the tap weights. To resolve 
this problem, the tap-weight vector w.-(k) should be regularly checked and constrained, 
as explained below. 

Assume that w+ (k) satisfies the required time-domain constraint. That is, the last L — 1 
elements of IDFT of w.-(k) are all zero. This implies that 


Using this in the constrained FBLMS recursion (8.47), we obtain 


wrk + 1) = Py olwe(k) + 2u} (k)jez(k)] (8.66) 
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Table 8.1 Summary of the FBLMS algorithm. 


Input: Tap-weight vector, w+(k), 
Signal power estimates, a (k — 1)’s, 
Extended input vector, l 
K(k) = [x(kL— N + 1)x(kL— N +2) --- x(kKL4+L—1)]', 
and Desired output vector, 
d(k) = [d (kL) d (kL + 1) --- d(kL + L — 1)]T 

Output: Filter output, 
y(k) = [y(KL) y(kL + 1) «++ y(kL + L- DI", 
Tap-weight vector update, w(k + 1) 


1. Filtering: 
x(k) = FFT(X(k)) 
y(k) = the last L elements of IFFT(x;(k) O wr(k)) 


2. Error estimation: 
e(k) = d(k) — y(k) 


3. Step-normalization: 
for i=0 to N’-1 


62. (k) = BGR, (k 1) + (= Pler (kK)? 


ilk) = holz, (K) 
u(k) = [ep (k) ni(k) uy OT" 


4. Tap-weight adaptation: 


0 
e,(k) = FFT (Fal 


wrk + 1) = wrk) + 2u(k) © x} (k) © elk) 
5. Tap-weight constraint: 


wș(k + 1) = FFT (= M elements a IFFT (we (k + =) 


Notes: 


N: filter length; L: block length; N’ =N+L-—1. 

0 denotes the column zero vectors with appropriate length to extend vectors to the length of N’. 
© denotes the element-wise multiplication of vectors. 

Here, a(k) is defined as a column vector. This is different from the definition of a(k) in the 
text where it is defined as a diagonal matrix. 

e Step 5 is applicable only for the constrained FBLMS algorithm. 


This recursion constrains w+ (k + 1) after every iteration and thus prevents any accumula- 
tion of roundoff noise errors. Implementation of the constrained FBLMS algorithm given 
in Table 8.1 is based on this recursion. 

Before ending this section, some remarks on real- and complex-valued signal cases 
would be instructive. Although, all the derivations in this chapter are given for real-valued 
signals in order to prevent some unnecessary confusions, the final algorithm presented 
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in Table 8.1 is applicable to both real- and complex-valued signals. Another point to be 
noted in the case of real-valued signals is that all the frequency domain vectors will be 
conjugate symmetric.* This implies that the first half of these frequency domain vectors 
contains all the necessary information and, hence, their second halves can be ignored. This 
reduces the computational complexity and memory requirement of the FBLMS algorithm 
by about 50%. 


8.3.5  FBLMS Misadjustment Equations 


Derivation of the misadjustment equations for the various implementations of the FBLMS 
algorithms is quite tedious and long. This is done in Appendix 8B. The derivations 
presented in Appendix 8B result in the following misadjustment equations: 


Mesims © HN, (0) (8.67) 
Misims © HN", (0) (8.68) 
Mesims © HoN/N' (8.69) 
Mesias © Mo (8.70) 


In these equations, superscripts c and u refer to the constrained and unconstrained versions 
of the FBLMS algorithm, respectively, and the superscript “n” indicates that the step- 
normalization has been applied. We also note that, similar to Eqs. (6.64) and (8.13), 
(8.67)—(8.70) are valid only for misadjustment values of 10% or lower. 

It can be immediately concluded from Eqs. (8.67)—(8.70) that the constrained FBLMS 
algorithm outperforms its unconstrained counterpart, in the sense that the former results 
in a lower misadjustment, for a given step-size parameter. Equivalently, for a given 
misadjustment, the constrained FBLMS algorithm converges faster than its unconstrained 
counterpart. The difference between the two algorithms is determined by the ratio 
(= we) which, in turn, is determined by the ratio E, Clearly, when L < N, 
— ~ 1, then the difference between the constrained FBLMS algorithm and its 
unconstrained counterpart becomes insignificant. On the other hand, when L and W are 
comparable, the difference between the two algorithms will be significant. 


8.3.6 Selection of the Block Length 


Block processing of signals, in general, results in certain time delay at the system output. 
In many applications, this processing delay may be intolerable and hence it has to be 
minimized. It arises because a block of samples of input signal has to be collected before 
the processing of the data can begin. Consequently, the processing delay increases with 
block length. On the other hand, the per sample computational complexity of a block 
processing system varies with the block length, L. For values of L smaller than the filter 
length, N, per sample computational complexity of the FBLMS algorithm decreases as 
L increases. It reaches close to its minimum when L ~ N. Thus, in applications where 


3A length M vector u= [ugu] =+ uyl! is called conjugate symmetric when u; =uj,_;, for i= 
0,1, , LŽ], where Lf) denotes integer part of u. 
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the processing delay is not an issue, L is usually chosen close to N. The exact value 
of L depends on N. For a given N, one should choose L so that N’ = N+ L — 1 is 
an appropriate composite number and efficient FFT and IFFT algorithms can be used 
in the realization of the FBLMS algorithm. On the other hand, in applications where it 
is important to keep the processing delay small, one may need to strike a compromise 
between system complexity and processing delay. In such applications, an alternative 
implementation of the FBLMS algorithm, which is introduced in the next section, is 
found to be more efficient. 


8.4 The Partitioned FBLMS Algorithm 


When the filter length, N, is large and a block length, L, much smaller than N is used, an 
efficient implementation of the FBLMS algorithm can be derived by dividing (partitioning) 
the convolution sum of Eq. (8.17) into a number of smaller sums and proceeding as 
discussed below. The resulting implementation is called partitioned FBLMS (PFBLMS) 
algorithm. 

The PFBLMS algorithm has apparently been discovered by a number of independent 
researchers and has been given different names: Asharif et al. (1986) and Asharif and 
Amano (1994) call it frequency bin adaptive filtering; Soo and Pang (1987, 1990) refer 
to it as multidelay FBLMS; and Sommen (1989) use the name PFBLMS. 

Let us assume that N = P - M, where P and M are integers, and note that the convo- 
lution sum of Eq. (8.17) may be written as 


P-1 
y(n) =} y(n) (8.71) 
1=0 
where 
M-1 
y(n) = 5 w; mx(n— IM — i) (8.72) 
i=0 


To develop a frequency domain implementation of these convolutions, we choose a block 
length L = M and divide the input data into blocks of length 2M samples such that the last 
M samples of, say, the kth block are same as the first M samples of the (k + 1)th block. 
Then, the convolution sum in Eq. (8.72) can be evaluated using the circular convolution 
of these data blocks with the appropriate weight vectors, having padded with M zeros. 
Using x(kM + M — 1) to represent the newest sample in the input, we define the vectors 


xp (k) =FFT([x((k- DM-M) x((k-I)M-—M +1) 
x((k —1)M + M — 1)]") (8.73) 


ee T 
We lk) = FFT ((wjy (k) wy i(k) © Wmm- 0 0 -:- OF") (8.74) 
yi(k)=[y(KM)  y(kM+1) > oy, KM +M- (8.75) 


and note that 


y,(k) = the last M elements of IFFT(wz /(k) © xz ;(k)) (8.76) 
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where © denotes multiplication on the element-wise basis, k, as before, is the block index, 
and / is the partition index. We also define 


y(k) = [y(kM) y(kM + 1) --- y(kM +M—1)]" (8.77) 
and note that 


P-1 
yk) = yk) (8.78) 
1=0 


Furthermore, from Eq. (8.73), we note that 
Xz )(k) = X¢o(k — 1) (8.79) 


It may be noted that according to the derivations in the earlier sections, for a filter 
(here, partition) length of M and a block length of L = M, the frequency domain vectors 
of length 2M — 1 are sufficient to perform the necessary convolutions in the frequency 
domain. Here, we are using vectors that are of length 2M as this greatly simplifies the 
implementation of the PFBLMS algorithm. In particular, we note that Eq. (8.79) holds 
only when L = M. 

Substituting Eq. (8.76) in Eq. (8.78), interchanging the order of summation and IFFT, 
and using Eq. (8.79), we obtain 


P-1 
y(k) = the last M elements of IFFT (x: We i(k) OX¢.g(k — D) (8.80) 
1=0 


Using this result, the block diagram of the PFBLMS algorithm may be proposed, as 
depicted in Figure 8.4. Here, the delays, z~!’s, are in the unit of block size and the 
thick lines represent frequency domain vectors. Also, for our later discussion, it may 
be remarked here that the implementation of the summation on the right-hand side of 
Eq. (8.80) can also be considered as a parallel bank of 2M transversal filters, each of 
length P, with the jth filter processing the frequency domain samples belonging to the 
jth frequency bin, for j = 0,1,...,2M — 1. 
The adaptation of the filter tap weights is done according to the recursions 


for Z=0,1,...,P—1 (8.81) 


where a(k) is the vector of the associated step-size parameters that may be normalized 
in a similar manner as Eq. (8.63), 


e+(k) = FFT ee i a (8.82) 


d(k) = [d (kM) d(kM + 1) --- d(kM + M —1)]", and 0 is the length M zero column 
vector. 

Recursion (8.81) corresponds to the unconstrained PFBLMS algorithm. The constrained 
PFBLMS algorithm recursion is obtained by constraining the filter tap weights after every 
iteration of Eq. (8.81). 
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IFFT (the last 
L elements) 


S/P: serial-to-parallel 
P/S: parallel-to-serial 


add M zeros at 
the beginning 


X,p_i(k) = xe¢o(k — P +1) 
$ to adapt ez(k) 


tap weights FET 


we,p_1(k) 


Figure 8.4 The partitioned FRBLMS (PFBLMS) algorithm for L = M. 


6.4.1 Analysis of the PFBLMS Algorithm 


In this section, we analyze the convergence behavior of the PFBLMS algorithm. This 
analysis reveals that PFBLMS algorithm suffers from slow convergence and hence, we 
suggest some simple solutions to improve its convergence. The main emphasis of this 
section is convergence behavior of the unconstrained PFBLMS algorithm. However, we 
also make some comments on the behavior of the constrained PFBLMS algorithm. 

From the analysis of the FBLMS algorithm, we recall that the frequency domain samples 
of input which belong to different frequency bins (i.e., the signal samples at the output 
of the first FFT in Figure 8.3) are approximately uncorrelated with one another and 
hence, the associated correlation matrix may be approximated by a diagonal matrix. The 
step-normalization is then used to equalize the time constants of the various modes of 
convergence of the algorithm. 

Extending the above result to the PFBLMS structure, we find that in this case, there are 
2M parallel transversal filters (one belonging to each frequency bin of the signal samples) 
whose associated input sequences are approximately uncorrelated with one another. Thus, 
a simple approach to analyze the PFBLMS algorithm is to assume that the transversal 
filters associated with each bin converge independent of one another so that we can 
concentrate on the convergence behavior of these as independent filters. We use the term 
frequency bin filter to refer to these independent filters. We note that the analysis of the 
PFBLMS algorithm based on the assumption of independent frequency bins is rather 
coarse. However, a more exact analysis of the PFBLMS algorithm will be quite involved 
and beyond the scope of this book. 

From Eq. (8.73) and Figure 8.4, we note that the tap-input vector of the ith frequency 
bin filter is 


x3 (k) = [xF 0; (k) xF oi (k= 1) +--+ xe 040K P DI” (8.83) 
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where xz 9 ;(k) is the ith element of xz 9(k). Convergence behavior of the ith frequency 
bin filter is then determined by the eigenvalue spread of the correlation matrix 


Rei = ELxe(Oxe K] (8.84) 
or, equivalently, by its normalized version 
RE = (diag RAD RE (8.85) 


However, we note that when the input process, x(n), is stationary, the diagonal elements 
of RË are all identical and thus, diag[ Rt] is proportional to the identity matrix. Hence, 
RË and RË” have the same eigenvalue spread. 

Now, observe the fact that the matrix Ry} is a subdiagonal part of the dual of matrix 
Rž. which was obtained in Section 8.3.2 while analyzing the unconstrained FBLMS 
algorithm (8.58). As a result, the following analysis is applicable only to the uncon- 
strained PFBLMS algorithm as it is based on a study of the matrix Ro, The constrained 
PFBLMS algorithm requires further attention and we will make some comments on its 
convergence behavior at the end of this subsection. A modified version of the constrained 
PFBLMS algorithm, with significantly less computational complexity, will be introduced 
in Section 8.4.5. 

In order to keep the analysis simple, we consider the case where the input sequence, 
x(n), is white. Even though this assumption simplifies the analysis greatly, the results 
obtained are still able to bring out the salient features of the algorithm. For example, the 
computer simulations given in the next section show that the conclusions drawn in this 
section remain valid even when x(n) is highly colored. 

The ith element of xz 9 ;(k) (i.e., the ith frequency bin sample of the filter input) is 


2M-1 
xz oi(k) = >) x(KM — M + me" 2M (8.86) 


m=0 


When x(n) is white, it is straightforward to show that 


2Mo?, for l=m 
E[x¢.oi(k —Dx¥o;(k —m)] = }(-D! x Mo?, for l=m+1 (8.87) 
0, otherwise 


where of is the variance of x(n). Using this result, we obtain 


la, 00- 00 
a, la; O - 0 0 
Rim = Oa lapy 00 (8.88) 


where œ; = (—1)! x 0.5. 
The eigenvalues of RAN (which are independent of i) can be obtained numerically. 
These are presented in Table 8.2, for values of P in the range of 2 to 10. It is noted 
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Table §.2 Eigenvalues of Fee for different number of partitions, P. 


P 2 3 4 5 6 7 8 9 10 

iğ 1.500 1.707 1.809 1.866 1.901 1.924 1.940 1.951 1.959 
ds 0.500 1.000 1.309 1.500 1.623 1.707 1.766 1.809 1.841 
hs 0.293 0.691 1.000 1.222 1.383 1.500 1.588 1.655 
As 0.191 0.500 0.778 1.000 1.174 1.309 1.415 
a 0.134 0.376 0.617 0.826 1.000 1.142 
hes 0.099 0.293 0.500 0.691 0.858 
us 0.076 0.234 0.412 0.585 
ie 0.060 0.191 0.345 
hs 0.049 0.159 
A 0.041 
uinn 3 5.828 9.472 13.93 19.20 25.27 32.16 39.86 48.37 


that these are widely spread and their dispersion increases significantly as P grows. 
This means that for large values of P, the PFBLMS algorithm may suffer from 
slow convergence and/or numerical instability, as in Eq. (8.88), RA” becomes badly 
ill-conditioned, for large P. 

Observe from the PFBLMS structure shown in Figure 8.4 that the successive partitions 
of the input samples are 50% overlapped. The value of |œa;| = 0.5 in Eq. (8.88), which in 
turn results in the large eigenvalue spread in RÈL” is a direct consequence of this 50% 
overlapping. Numerical studies show that this eigenvalue spread reduces as |œ; | decreases. 
Furthermore, |œ;| can be reduced by reducing the amount of overlap of the successive 
partitions of the input samples. This is easily achieved by choosing a block length, L, 
smaller than the partition length, M, as explained in the next section. 

Before proceeding with this modification of the PFBLMS algorithm, we shall make 
some comments on the convergence behavior of the constrained PFBLMS algorithm. As 
was noted before, the correlation matrix Rei was the outcome of an analysis of the 
unconstrained PFBLMS algorithm. A detailed examination of the constrained PFBLMS 
algorithm is rather involved and beyond the scope of this book. As we will demonstrate 
through computer simulations later, the effect of overlapping of successive blocks is 
resolved when the tap weights of the filter are constrained. As a result, we find that the 
constrained PFBLMS algorithm does not have any convergence problem. It converges 
almost as fast as its nonpartitioned counterpart. These observations are in line with the 
theoretical findings that have presented in Chan (2000) and also in Chan and Farhang- 
Boroujeny (2001). 


8.4.2 PFBLMS Algorithm with M > L 
Assuming a block length L and a partition length M, define the vector 
K(k) = [x(kKL— M) x(kL— M + 1) --- x(kL+ L — 1)]" (8.89) 


Let us choose M = pL, where p is an integer. As we show later, this choice of L 
and M leads to an efficient implementation of the PFBLMS algorithm. We note that 
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a(n) sir | Ea 


S/P: serial-to-parallel 
P/S: parallel-to-serial 


IFFT (the last 
L elements) 


add M zeros at 
the beginning 


xr p-1ı(k) = x¢,0(k — p(P — 1)) 
adi ex(k) 
lig Soe 

we,p_1(k) ap wes 


Figure 8.5 The PFBLMS algorithm for M = pL. 


if we want to use the DFT to compute the partitions in Eq. (8.71), then Xg(k) corre- 
sponds to the vector of input samples associated with the first partition, that is, yọ(n) 
in Eq. (8.71) with n = kL + L — 1. Observe that the first element of Xp(k) is x(kL — 
M). Similarly, the vectors corresponding to the subsequent partitions start with samples 
xX(KL— 2M) = x((k — p)L — M), x(kL — 3M) = x((k — 2p)L — M), and so on. We thus 
find that X, (k) = X9(k — pl) or 


xz (kK) =X¢o(k—pl), for 1=1,2,...,P—1 (8.90) 


Using this result, Figure 8.5 depicts an implementation of the PFBLMS algorithm when 
M = pL. Comparing Figures 8.4 and 8.5, we find that the major difference between 
the two structures is that each delay unit in Figure 8.4 is replaced by p delay units in 
Figure 8.5. Table 8.3 gives a summary of the PFBLMS algorithm for the case where 
M=pL. 

Following the notations used before, we note that, for the new arrangement in 
Figure 8.5, 


x(k) = [xF 0i (k) XF oi (k — P) +++ XF oik — (P - Ip)" (8.91) 
Also, 
M+L-1 
Iroa = 5 x(kL— M + mje /27m/(M+L) (8.92) 
m=0 


Assuming that x(n) is white, we obtain from Eq. (8.92) 


(M + L)o?, for / =m 
. 2a p(m—Il)i 


E[xr o; (k — pI) x¢oi(k —pm)] = he! T Lo?, forl=m+1 (8.93) 
0, otherwise. 
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Table 8.3 Summary of the PFBLMS algorithm. 


Input: Tap-weight vectors, we (k), 1=0,1,...,P—-1, 
Extended input vector, 


Ky (k) = [x(kL— M) x(kKL—M+1) --- x(KL+L— yy, 
The past frequency domain vectors of input, xz (k — J), for? = 1,2,...,(P— Dp, 
and desired output vector, 

d(k) = [d (kL) d(kL+1) --- d(kL+L—1)]". 


Output: Filter output, y(k) = [y (kL) y(kL + 1) --- y(kL + L — 1)]f, 
Tap-weight vector update, We i(k +1),/=0,1,...,P—1. 


1. Filtering: 
Xz o(k) = FFT (X(k)) 


P-1 
y(k) = the last Lelements of IFFT (£ We (k) O Xx g(k -»») 
1=0 


2. Error estimation: 
e(k) = d(k) — y(k) 


3. Step-normalization: 
fori =0 to M’—1 


62, (k) = BE k- D+ lez oO? 
ilk) = hol êkpo (4) 
p(k) = [mok mO o yp" 
5. Tap-weight adaptation: 


0 
o =e (f e) 


forl = 0to P — 1 


wg (k + 1) = we (k) + 2u(k) © x} olk — pl) © ep(k) 


5. Tap-weight constraint: 
for /=0 to P—-1 


We (k + 1) = FFT (as M elements a IFFT (w; (k + 1) 


Notes: 


M: partition length; L: block length; M’ = M + L. 

0 denotes column zero vectors with appropriate length to extend vectors to the length of M’. 
© denotes element-wise multiplication of vectors. 

Step 5 is applicable only for the constrained PFBLMS algorithm. 


274 Adaptive Filters 


Table 8.4 Eigenvalue spread, i... /Amin> OF Rhim for P = 10 and different values of p. 
p 1 2 3 4 5 6 7 8 9 10 
Amar Manin 48.37 4.55 2.84 229 1.94 1:75 1.63 1.54 1.47 1.42 


Using this, we get 


(8.94) 


1 j 2xpi 
pm p+1 
where aq; pte : 


The eigenvalue spread of Rie (which is independent of i) for values of p changing 
from 1 to 10 and a fixed value of P = 10 are given in Table 8.4. The results clearly show 
that reducing the overlap of successive partitions significantly improves the convergence 
behavior of the unconstrained PFBLMS algorithm. 


8.4.3 PFBLMS Misadjustment Equations 


The following results can be derived for the PFBLMS algorithm by following the same 
line of derivations as in Appendix 8B. 


Mobrpims © HPM $, (0) (8.95) 

Moreims © UP(M + Ld, (0) (8.96) 
M 

Mot ~ LP ——— 8.97 

PEBLMS ~ Mot 77 TL (8.97) 

MPFBLMS © HoP (8.98) 


As in Section 8.5, here also we find that the constrained PFBLMS algorithm achieves a 
lower level of misadjustment compared to its unconstrained counterpart. The price paid 
for this is a higher computational complexity. 


8.4.4 Computational Complexity and Memory Requirement 


In this section, we give some figures indicating the computational complexity and mem- 
ory requirement of the PFBLMS algorithm. Instead of specifying the exact number of 
multiplications and additions, we specify a macrofigure such as the number of butterflies 
for quantifying computational complexity as this may be more meaningful in the case of 
such algorithms. To estimate the memory requirement, we consider only its major blocks 
and ignore details such as the temporary memory locations required as these will depend 
on the DSP system used and also the efficiency of the code written. Furthermore, we only 
discuss the computational complexity of the unconstrained PFBLMS algorithm. The con- 
strained PFBLMS algorithm has not been discussed here as its computational complexity 
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will depend, to a great extent, on how the constraining step is implemented. We make 
some comments on this in the next subsection. 

In the implementation of the unconstrained PFBLMS algorithm, processing of each data 
block requires two (p+ 1)L (= M + L) point FFTs and one IFFT of the same length. 
Assuming that the data signals are all real-valued, (p + 1)Z is chosen a power of 2, 
and an efficient FFT algorithm such as the one used by Bergland (1968) is used, then 
WIDE Jog, PEDE butterflies will have to be performed to complete each FFT. Compu- 
tation of output samples in the frequency domain, that is, xz 9(k —1) Ow¢/(k), for 
1=0,1,...P—1, and implementation of tap-weight adaptation recursion Eq. (8.81) 
require two-and-half complex multiplications and two complex additions per data point. 
Since step-size parameters are real-valued, multiplication of a gradient term with its step- 
size parameter is counted as half complex multiplication. Furthermore, step-normalization 
adds some more computations. To give a simple figure, we put all these computations 
(excluding the FFTs and IFFTs) together and roughly say that the complexity of pro- 
cessing of each data point in the frequency domain is equivalent to performing two 
butterflies. Noting that each partition of input samples, which consists of (p + 1)L real- 
valued samples in time domain, is converted to (p + 1)L/2 complex-valued frequency 
domain samples and there are P such partitions, the total number of frequency domain 
samples is (p + 1)LP/2. Adding these together and noting that L output samples are gen- 
erated at the end of each block processing interval, we obtain the per-sample computational 
complexity of the unconstrained PFBLMS algorithm as 


(p + DLP + 3(p + 1)Llog, P+ 
L 


3 IL 
=(p+ P+ gp + Diog PEO" (8.99) 


The memory requirements of the unconstrained and constrained PFBLMS algorithms 
are about the same. The number of frequency domain data samples (including the interme- 
diate results in the z~? delay units) is (p(P — 1) + 1)(p + 1)L. We also need (p + 1)LP 
memory words to store the filter coefficients. Some additional storage for input, output, 
error samples, and step-size parameters is also required. Adding these together, the number 
of memory words required to implement the PFBLMS algorithm is approximately 


S = (p+1)°LP words (8.100) 


To get a feeling of the above numbers, we give the following example. 


Example 8.2 


Let us consider an acoustic echo canceler which has to cover an echo spread of at least 
250 ms at the sampling rate of 8 kHz. It is recommended that the algorithm latency (delay) 
in delivering the echo-free samples shall not exceed 16 ms. To cover an echo spread of 
250 ms, an adaptive filter with at least 2000 taps should be used as 250 ms is equivalent 
to 2000 samples at the sampling frequency of 8 kHz. To achieve a latency of less than 
16ms, L = 64 is appropriate. Note that there will be a delay of L samples to collect a 
new block of input samples, and there will be an additional delay of up to one block 
period (i.e., L sample intervals) to calculate the corresponding block of output samples. 
This gives a total delay of up to 2L sample intervals, which for L = 64 and the sampling 
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Table 8.5 Computational complexity and memory requirement of the unconstrained PFBLMS 
algorithm for the cases discussed in Example 8.2. 


Computational Complexity Memory words 
p=1, P=32 73 8192 
p=3,P=11 65 11 264 
p=7, P=5 88 20 480 


rate of 8kHz is equivalent to 16 ms. Table 8.5 gives a summary of the computational 
complexity and memory requirement of the unconstrained PFBLMS algorithm for p = 1, 
3, and 7. These values of p result in (p + 1)L being a power of 2 and, therefore, an 
efficient radix 2 FFT algorithm can be used. From these results, we note that p = 3 is 
a good compromise choice as it results in some reduction in computational complexity 
and, as demonstrated in Section 8.5, significant improvement in convergence behavior, at 
the cost of slight increase in memory. 


8.4.5 Modified Constrained PFBLMS Algorithm 


Our discussions on the PFBLMS algorithm, so far, suggest that the constrained PFBLMS 
algorithm is significantly more complicated than its unconstrained counterpart. This is 
because the tap weights of all partitions have to be constrained at the end of every 
iteration of the algorithm (see Step 5 in Table 8.3). McLaughlin (1996) has proposed 
a method that significantly reduces the computational complexity of the constrained 
PFBLMS algorithm, while its convergence behavior is almost unaffected. His method 
does not constrain the tap weights at the end of all iterations. In the context of a 
PFBLMS-based acoustic echo canceler, he has a special scheduling method for applying 
the constraint to the various partitions. 

Chan (2000) (see also Chan and Farhang-Boroujeny (2001)) has analyzed the McLaugh- 
lin’s constraining method and explained its excellent performance. In the context of a 
general constrained PFBLMS algorithm, the following tap-weight constraint scheduling 
scheme is suggested by Chan (2000). After every iteration of the PFBLMS algorithm, 
the tap weights of one or a few of the partitions are constrained on a rotational basis. 
For example, in the first iteration, the tap weights of the first partition is constrained. 
In the second iteration, the constraint operation is applied to the second partition. This 
process continues until all the partitions are constrained. The constraint operation then 
restarts with the first partition. Clearly, in cases where the number of partitions, P, is 
large, this simple approach can significantly reduce the computational complexity of the 
constrained PFBLMS algorithm. Here, to avoid excessive mathematical derivations, we 
limit ourselves to confirming the conclusions drawn by Chan (2000), numerically, through 
computer simulations. 


8.5 Computer Simulations 


In this section, we present some simulation results that confirm the theoretical results 
derived in the previous sections. These results also serve to enhance our understanding 
of the convergence behavior of the FBLMS and PFBLMS algorithms. 
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Figure 8.6 Adaptive modeling of an FIR plant. 


We consider a modeling problem, as shown in Figure 8.6. The plant, W,(z), which 
is to be identified by the adaptive filter, W(z), is assumed to be a FIR system with an 
impulse response stretching over 1985 samples. This choice of filter length allows us to 
use a FBLMS algorithm with L = 64 and N’=N+L—-1= 2" for modeling W,(z) 
(note that 1985 = 2!! — 64+ 1). W (z) is assumed to have sufficient taps to model W,(z) 
perfectly. Two cases of the input, x(n), are considered: 


1. a white process. 
2. a colored process that is generated by passing a white noise through a coloring filter 
with the transfer function 


H(z) = 0.1 —0227' — 0.377? + 042° + 0.4774 — 0.2775 —O1¢ 


Recall that this coloring filter is same as the filter H(z) used in Chapter 7 
(Section 7.6.4). The power spectral density of the process generated by this filter is 
shown in Figure 7.7. The samples of the plant impulse response, w, ;’s, are chosen to 
be a set of identically independent random numbers, and they are normalized so that 
yer = 1. The sequence e,(n) is an additive white Gaussian noise. It is independent 
of x(n) and its variance is set equal to 0.001 for the simulations presented here. So the 
expected minimum MSE at the adaptive filter output is 0.001. Three cases of p = 1, 
3, and 7 are considered. To completely cover the impulse response of the plant, P is 
chosen to be 32, 11, and 5, respectively, for these cases. For each case, the step-size 
parameter jz, is chosen using the misadjustment equations given before so as to result 
in 10% misadjustment. The algorithms used are of the step-normalized type. Learning 
curves presented here are based on ensemble averages of 100 independent runs for each 
curve. The averaged curves are smoothed before being plotted. 

Figure 8.7 shows the results of the simulations for white input. As expected, perfor- 
mance of the unconstrained PFBLMS algorithm is quite poor when the overlap is 50% 
among the successive partitions (i.e., the case p = 1) and improves as the overlap is 
reduced by increasing p. The case when no partitioning is applied, that is, corresponding 
to the FBLMS algorithm, is also shown for comparison. 

Figure 8.8 repeats Figure 8.7 for the case when x(n) is generated using the coloring 
filter H(z). In this case, the eigenvalue spread of the correlation matrix of x(n) can be as 
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Figure 8.7 Learning curves of the FBLMS and PFBLMS algorithms with white input. 
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Figure 8.8 Learning curves of the FBLMS and PFBLMS algorithms for a colored input. 
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Figure 8.9 Learning curves of the constrained PFBLMS algorithm and its modified version. 


high as 459. Here also we find that reducing the amount of overlap between successive 
partitions improves the performance of the PFBLMS algorithm. Furthermore, there is a 
very little difference between the results in Figures 8.7 and 8.8. This is in line with the 
theoretical results of the previous sections that predict that the step-normalized FBLMS 
and PFBLMS algorithms are insensitive to the power spectral density (eigenvalue spread) 
of the filter input. 

Figure 8.9 compares the convergence performance of the constrained PFBLMS algo- 
rithm and one of its modified versions, with p = | and P = 32. The filter input is colored 
and is generated using the coloring filter H(z). In the implementation of the modified 
PFBLMS algorithm, the tap-weight constraint operation is applied on rotational basis to 
only one of the partitions in each iteration. Observe from the results that even though 
each partition in the modified constrained PFBLMS algorithm is constrained only once in 
every 32 iterations, the resulting performance loss is negligible. Also, by direct inspection 
of the learning curves of Figure 8.9, we see that overlap of the partitions has no significant 
effect on the convergence behavior of the constrained PFBLMS algorithm. This is in view 
of the fact that there is only one dominant mode affecting the convergence behavior of 
the constrained PFBLMS algorithm, as can be seen from the learning curves. 


Problems 


P8.1 Consider the BLMS recursion Eq. (8.11). In Appendix 8A, it is shown that 
Eq. (8.11) can be rearranged as 


vik +1) = (I- 2-EXT)X(k)) v(k) + 2=EXT We, (k) 
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P8.2 
P8.3 


P8.4 


P8.5 


P8.6 
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where v(k) = w(k) — Wy, W, is the optimum tap-weight vector of the filter and 
e,(k) = d(k) — X(k)w,. 


(i) Assuming v(k) and X(k) are independent of each other, show that 
E[v(k + 1)] = I — 2ugR)Elv(k)] 


where R is the correlation matrix of the filter tap inputs. 

(ii) Use the result of (i) to obtain the time constants that control the convergence 
behavior of E[v(k)]. 

(iii) Based on the result obtained in (ii), justify the validity of Eq. (8.12). 


By direct application of Eqs. (8.21) and (8.26), confirm the identity (8.28). 


Define the time-reversed version of the vector a= [dy a; a +- dy_j]' as 
T 
a= [a GQy-1 4y—2°°° ay] . 


(i) Show that if a; and a are the DFTs of a and a’, respectively, then 
ap = ay 


where asterisk denotes complex conjugation. 

(ii) Show that if A, is a circular matrix as in Eq. (8.21), AT is also a circular 
matrix. Compare the first columns of A, and AT and show that they are 
time-reversed versions of each other. 

(iii) Use the above observation to give an alternative derivation of Eq. (8.32). 


By direct application of Eqs. (8.33), (8.34), (8.42), (8.43), and (8.45), show that 
Eq. (8.44) is just an alternative formulation of Eq. (8.11). 


In the derivation of the LMS algorithm, the instantaneous value of e?(n) was 
used as an estimate of the cost function £ = E[e*(n)]. Give a direct derivation 
of the BLMS algorithm by considering 


L-1 
E(k) = - X er (KL +i) 


i=0 


as an estimate of the cost function € and running the steepest-descent recursion 
once after every L samples of the data. 


The estimate R (k) defined in Problem P8.5 may equivalently be written as 
a 1, 2 
sulk) = Te (Oe) (P8.6.1) 


where ê(k) the output error vector of the filter in the extended form, as defined 
by Eq. (8.43). Using the DFT properties, Eq. (P8.6.1) can be expressed in terms 
of er(k) = Fe(k) as 


z 1 
éB (k) = Tp eF Wer) (P8.6.2) 


where N’ = N + L — 1 is the length of the vector ê(k). 
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P8.7 


P8.8 


P8.9 


P8.10 


To obtain the optimum frequency domain tap-weight vector wp, the cost func- 
tion E(w) = Elép (k)] should be minimized. Accordingly, the optimum solution 
obtained by the constrained FBLMS algorithm is the one minimizing (w+), 
subject to the constraint Py oWp = wp. On the other hand, the unconstrained 
FBLMS minimizes (w+) without imposing any constraint on we. 


(i) Show that 


F(we) = —— wERtw, — pw, - wi E(d-d 

F) = Ty WF Ra Wr — PeWe — Wei + Eldzede)) 
where Ry, is as defined in Eq. (8.57) and ps = E[X} Po, dF]. 

(ii) Find the optimum value of wp that minimizes (w+). Show the nonsingu- 
larity of Rý, that exists is the necessary and sufficient condition for this 
solution to be unique. 

(iii) It is understood that when a sufficiently small step-size parameter is used for 
both constrained and unconstrained FBLMS algorithms so that the misad- 
justments of the two algorithms can be ignored, the unconstrained FBLMS 
algorithm converges to a mean-squared error (MSE), which is less than or 
equal to what can be achieved by the constrained FBLMS algorithm. With 
the knowledge developed in this problem, how do you explain this? 


Starting with Eq. (P8.6.2) of the last problem, give a direct derivation of the 
unconstrained recursion (8.49). 


Consider a modeling problem with the desired output d(n) = wi x(n) +e,(n), 
where the length of w, is less than or equal to the length of the adaptive filter, 
N. Assume that the plant noise, e,(7), and its input, x(n), are uncorrelated with 
each other. Under these conditions, it is understood that the constrained and 
unconstrained FBLMS algorithms converge to exactly the same solution. Using 
the result obtained in Problem P8.6 and assuming that the inverse of RX, exists, 
give reasons that explain this. What is the common solution to which both the 
constrained and unconstrained FBLMS algorithms converge? 


Show that when the block length, L, is equal to the filter length, N, for a given 
misadjustment, the constrained FBLMS algorithm converges twice faster than its 
unconstrained counterpart. Support your answer by giving a careful considera- 
tion to the time constants associated with the two algorithms. Does your answer 
continue to hold if step-normalization is (i) used and (ii) not used? 


Consider the constrained FBLMS recursion (8.47). Show that when the block 
length, L, is one: 


© Py. = Py o = I, where I is the N-by-N identity matrix. Then, argue that 
the constrained and unconstrained FBLMS algorithms are the same. 
(ii) the FBLMS recursion can be rearranged as 


wrk + 1) = wek) + 2uPxe(ke(k) 


where x(k) is the DFT of the first column of the circular matrix X,(k), 
as defined by Eq. (8.33), e(k) is the scalar output error at time k, and T 
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is the diagonal matrix consisting of the elements 1, e/?7/%, e/47/%, .. 


ei2(N-1)x/N | 
(iii) 
y(k) = the last term of =F! (we(k) Ox¢(k)) 
= wr(k)x¢(k) 


where © denotes the element-wise multiplication of vectors. 

(iv) Now, consider the TDLMS algorithm with 7 = F. Write down the 
equations corresponding to this case and compare them with the above 
results. Verify that the FBLMS algorithm with block length L = 1 is 
equivalent to the TDLMS algorithm with T = F. 


P8.11 An alternative procedure for derivation of Eq. (8.61) is proposed in this problem. 


(i) Show that 
Po. = FP iF 


and conclude that Py ; is a circular matrix. 
(ii) Show that 


1 
First column of Py, = wit Po, L 


where pPo,z is the column vector consisting of the diagonal elements of Po ;. 
(iii) Considering the fact that 4-(k) is a diagonal matrix, show that 


XF(K)Po Xp = Por O (xe Oxe OM) 


where x(k) is the column vector consisting of the diagonal elements of 
Xzr(k), and © denotes element-wise multiplication of the matrices. Thus, 
show that 

Rix = Pot © Rix 


where R = E[x¢(k)xit(k)]. 
(iv) Assuming that the cross-correlation between different elements of the vector 
x(k) are negligible, show that 


; f j2% jz jN -D 
R,, © N' x diag ba (e Ww), by (e Pona Ba e wW i 


(v) Using the results of (iii) and (iv), derive Eq. (8.61). 

(vi) Do a thorough study of the elements of the matrix Po z. In particular, 
verify that the largest (in magnitude) elements of Po are its diagonal 
elements. Use your findings to conclude that Rý, is closer to diagonal 
than R» in the sense that the nondiagonal elements of the normalized 
matrix (diag[R%.])~'R. are smaller than the corresponding elements of 
(diag[R a DR. 


P8.12 Verify the results presented in Eq. (8.87). 
P8.13 Verify the results presented in Eq. (8.93). 
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P8.14 


P8.15 


P8.16 


P8.17 


For the case discussed in Example 8.2, evaluate the computational complexity 
and memory requirement of the FBLMS algorithm, for both the constrained and 
unconstrained cases, and compare your results with those given in Table 8.5. 


Discuss in detail why the selection of M = pL results in less overlap among 
successive partitions as p increases. 


In the results presented in Table 8.5, we find that the unconstrained PFBLMS 
algorithm with p = 3 is less complex than the case where p = 1. Explore the 
contribution of various parts of the algorithm to find out why this is happening. 


Evaluate the computational complexities of the constrained PFBLMS implemen- 
tation and its modified version, which were used to obtain the simulation results 
of Figure 8.9 and compare your results with those in Table 8.5. 


Computer-Oriented Problems 


P8.18 


P8.19 


P8.20 


The MATLAB program “b1k_mdlg.m’ which was used to obtain the results 
of Example 8.1 is available on an accompanying website. Run this program 
and confirm the results of Figure 8.2. In addition, using this program, study the 
convergence behavior of the BLMS algorithm for the following choices of L and 
Meyzs and discuss your findings. 


L Maus 


4N, 5N 10% 
N, 2N, 3N, 4N, 5N 5% 
N, 2N, 3N, 4N, 5N 20% 


Consider a channel equalization problem similar to the one discussed in 
Section 6.4.2. Assume that the channel response is characterized by the transfer 
function 


H(z) = 0.1 + 0.3277! + 0.67? + z7? + 0.5774 — 027° + 0.127 


the input data, s(n), to the channel is binary and white, the channel noise, v(n), 
is white and Gaussian, signal-to-noise ratio at the channel output is 30dB, and 
equalizer length, N, and the delay, A, are set equal to 33 and 18, respectively. 
Develop a simulation program to study the performance of FBLMS algorithm in 
this application. 


Consider the channel equalization setup of problem P8.19. By running appropriate 
simulation programs, study the convergence behaviors of the conventional LMS 
algorithm and the TDLMS algorithm (with various transforms) and compare your 
results with those of the FBLMS algorithm. 
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P8.21 


P8.22 


P8.23 


P8.24 
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Consider a system modeling problem similar to the one in Figure 8.6. Assume 
that W,(z) is a FIR filter with N = 1024 taps, and the length W (z) is chosen to be 
the same. The vector of coefficients of W,(z), Wo, has entries that are complex- 
valued Gaussian random variables with the same variance and are normalized 
such that ww, = 1. Also, assume that v(m) is complex-valued Gaussian white 
process with variance of unity. Consider the following cases of the input coloring 
filter H(z): 
A(z) = A(z) = 1 


H(z) = H(z) = 0.4 + z7! + 0.477? 


1+1.2z7! 
— 1.5z-! + 0.56z~? 


H(z) = H3 (z) = I 


(i) For each case, evaluate the eigenvalue spread À max/Àmin Of the correlation 
matrix R of the input signal to W(z). Note that as here N is very large, you 
may use the bounds maximum of ®,,(e/°) and minimum of ©,,.(e/”) as 
good approximations to A,,,, and Amin, respectively. 

(ii) Develop the necessary codes to compare the convergence behavior of LMS 
and FBLMS algorithms. In the case of FBLMS, consider both cases of the 
algorithm with and without step-normalization. For all cases, choose the 


step-size parameter u for a misadjustment of 10%. 


Develop the necessary codes to implement the PFBLMS structure of Figure 8.4 
for both unconstrained and constrained cases. Let M = L = 64 and present the 
learning curves of the algorithms when the step-size parameter jz is chosen to 
achieve 10% misadjustment. Compare your results with those of Problem P8.21. 


Develop the necessary codes to implement the PFBLMS structure of Figure 8.5 for 
both unconstrained and constrained cases. Let L = 32 and M = 96, and present 
the learning curves of the algorithms when the step-size parameter u is chosen to 
achieve 10% misadjustment. Compare your results with those of Problems P8.21 
and P8.22. 


Repeat Problems P8.22 and P8.23 when the constrained PFBLMS algorithm is 
replaced by its scheduled constrained version. Compare the results with those of 
the fully constrained version of the algorithm. 
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Appendix 8A: Derivation of a Misadjustment Equation for the 
BLMS Algorithm 


In this appendix, we present a simple derivation of the misadjustment of the BLMS algo- 
rithm. This derivation is different from the one used for the conventional LMS algorithm 
in Chapter 6. Because of certain assumptions used here (such as the step-size parameter, 
Upg,» is small, adaptive filter models the plant almost exactly), this derivation is rather less 
accurate. 

We start with recursion Eq. (8.11) and use the definition v(k) = w(k) — w,, where w, 
is the optimum tap-weight vector of the filter, to obtain 


vik +1) = v(k) + 2 EXT helk) (8A.1) 
We also note that 
e(k) = d (k) — X(k)w(k) = e, (k) — X(k)v (k) (8A.2) 


where e (k) = d (k) — X(k)w, is the output error when the optimum tap-weight vector, 
Wọ» is used. 
Substituting Eq. (8A.2) in Eq. (8A.1), we get 


vk+1)= (1 m 2EXTOXW) v v(k) +2 EXT (be, (k) (8A.3) 


Next, we multiply both sides of Eq. (8A.3) from the left by their respective transposes 
and expand to obtain 


Vik+tDvktl) =v (1 — 2#BXTEDX() v) 
p22 Pet GOX(K) (I (1-242 FEXT()X() vik) 
i T HB yT T 
+ 2B) (1 -2X o XT (ke, (k) 
2 
4 4 Bes OXOX Keok) (8A.4) 


Now, we follow the same line of derivation as in Chapter 6 (Section 6.3). We take 
expectation on both sides of Eq. (8A.4) and assume that e,(k) is zero-mean, X(k) and 
e,(k) are jointly Gaussian and uncorrelated with each others, and v(k) is independent of 
X(k) and e,(k). This results in 


vk + DIZ =E ma (1- fE xTOxW) v | 


+48 B [el OXX (Oe, (k)] (8A.5) 
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where ||v(k)||? = E[v'(k)v(k)]. The first term on the right-hand side of Eq. (8A.5) can 
be expanded as 


E ie (I = 2HEXTU)X(E)) v | = vÆ — 478 EW OXTOX (KV) 


2 
$ 458 EIV OXT OXOX TOXO] 
(8A.6) 


To simplify this, we assume that jp is small so that the last term on the right-hand side 
of Eq. (8A.6) can be ignored. Furthermore, using the independence assumption between 
v(k) and X(k) and following the same line of argument as in Chapter 6, we obtain 


2 
E yw (1 = 2° BXT()X(b)) v] x 
L 
u 
IVO -4 F EV OER TOX HIVO] (8A.7) 
Now note from Eq. (8.4) that 
E[X'(k)X(k)] = LR (8A.8) 
where R is the N-by-N correlation matrix of the filter tap inputs. Substituting Eq. (8A.8) 
in Eq. (8A.7), we get 
2 
E [a(t 2 2-8 XTUOX(k)) vo] x 


llv(k)|I? — 4ug EIVOR (k)] (8A.9) 


To evaluate the second term on the right-hand side of Eq. (8A.5), we note that 
e!(k)X(k)X"(k)e,(k) is a scalar and use Eq. (6.23) to write 


el (k)X(k)X" (ke, (k) = trle! (WX (KXT (ke, (k)] 
= trie, (ket (k)X(k)X1(k)] (8A.10) 


Taking expectation on both sides of Eq. (8A.10) and noting that e,(k) and X(k) are 
independent of each other, we obtain 


Eley (OX (KXT (k)e,(k)] = trl Ele, (Kes (KE LX (kX k)]] (8A.11) 


Next, we assume that the elements of e,(k) are samples of a white noise process. This 
assumption is justified when the adaptive filter is long enough to model the plant almost 
exactly. This implies that 

Eleg(k eg (k)] = Emin! (8A.12) 


where £ni = Ele2(n)] is the minimum MSE at the filter output, and the iden- 
tity matrix I is L-by-L. Substituting Eq. (8A.12) in Eq. (8A.11), noting that 
tr[X(k)X1(k)] = tr[X1(k)X(k)], and using Eq. (8A.8), we get 


Ele? (k)X(k)X! (k)e,(k)] = Lémintt RI (8A.13) 


min 
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Substituting Eqs. (8A.13) and (8A.9) in Eq. (8A.5), we obtain 
2 2 T LB 
lvk + DIP © IVI — 4ug Elv (k)Ry(k)] + 4 Emintr[R] (8A.14) 


When the algorithm has converged and reached its steady state, ||v(k + 1)||* = ||v(4)|I?. 
Using this in Eq. (8A.14), we obtain, in the steady state 


ia 

EIT (k)Rv(k)] © T Emintr[R] (8A.15) 
We recall that the left-hand side of Eq. (8A.14) is equal to the excess MSE of the algorithm 
after its convergence (see Eq. (6.21) and the subsequent discussions in the same section). 
Thus, we obtain 


Excess MSE of the BLMS algorithm ~ “BE pintt RI (8A.16) 


Dividing this excess MSE by the minimum MSE, min. we obtain Eq. (8.13). 
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Appendix 8B: Derivation of Misadjustment Equations for the 
FBLMS Algorithms 


Let us start with the definition of misadjustment. We recall from Chapter 6 that for an 
adaptive algorithm, misadjustment is defined by the equation 


M =, a (8B.1) 
Emin 
where &sxcess is the excess MSE due to perturbation of the filter tap weights after the 
algorithm has reached its steady state, and &,;, is the minimum MSE that could be 
achieved by the optimum tap weights. The excess MSE, as defined before (in Chapter 6), 
is given by the following equation and is evaluated after the convergence of the filter: 


excess = ELVT (n)x(n))?] (8B.2) 


Here, v(n) = w(n) — W, is the tap-weight perturbation vector and thus, vT(n)x(n) is an 
associated error quantity. 

In the case of FBLMS algorithm, where the perturbation vector v(k) varies only once 
every block, the excess MSE is defined as 


1 
Eas = TEIXO XOVE) (8B.3) 


where X(k)v(k) is the length L vector of error samples arising from the tap-weight 
perturbation v(k) during kth block. 

If w+(k) in Eq. (8.41) is replaced by v-(k), where v-(k) is defined as in Eq. (8.55), 
the result would be the error due to the tap-weight error v-(k). Using this result, we 
obtain 


1 
= TEPL F Xp Wve) Po. F XEVE] (8B.4) 


Note that the transpose operator “T” is replaced by the Hermitian transpose operator “H” 
in Eq. (8B.4) as the frequency domain variables are, in general, complex-valued. It should 
also be noted that the v-(k) in Eq. (8B.4) need not to be constrained, that is, the last 
L — 1 samples of F~'v;(k) need not be zero. Accordingly, Eq. (8B.4) can be used for 
evaluating the excess MSE for both constrained and unconstrained FBLMS algorithms. 

Rearranging the terms under the expectation in Eq. (8B.4), and noting that Po L = Por 
Po, = Poz, and (F-!)4 = $F, we obtain 


1 
Excess = Tyr EWE WA WFP) LF Xr WveW)] 


EWP) AZE(KYP L Ær(k)yz(k)] (8B.5) 


TIN 
We recall that the length of v+ (k) is N’ = N + L — 1, and 4-(k) is an N’-by-N’ diagonal 


matrix. Assuming that v+ (k) and 4,(k) are independent of each other, we obtain from 
Eq. (8B.5) 


1 
Ecras = Ty ENF ORGY EO! (8B.6) 


where Ry, = ELX} (k)Po L ¥7-(k)], as defined in Eq. (8.57). 
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Proceeding in the same line of derivations as in Chapter 6 (Section 6.3), we obtain 
from Eq. (8B.6) 


1 u 
eoxcess = rp "IK? KR] (8B.7) 


where Kz (k) = E[vs (k) vE (k)]. 

With the expressions Eqs. (8B.6) and (8B.7) for Eexcess, We are now ready to proceed 
with the derivation of the excess MSE for the various implementations of the FBLMS 
algorithm. 


Unconstrained FBLMS Algorithm Without Step-Normalization 


We multiply both sides of Eq. (8.56) from the right by their respective Hermitian trans- 
poses, expand, take expectation on both sides, and use similar assumptions as those used 
in deriving Eq. (8A.6), to obtain 


llve(k + DI? © Ive QI? — 4u EWE ORE e(k)] 
+ 4u? El(XF(K)Po Leo, -E) (XFK)Po Leo Fw )] (8B.8) 


In the steady state, ||v-(k + 1)|? = llv-(k)|I?. Thus, when the algorithm has reached its 
steady state, we obtain from Eq. (8B.8) 


E[VE (KY RAV e(k)] © WEAK) Po, Leo. FENEX OPo, Le.) 
© ELP Leo Fb) XEXE KP Leo rk) (8B.9) 


We note that the last expectation in Eq. (8B.9) is a scalar and thus, using Eq. (6.23), it 
may be rearranged as 


ELP, Leo, FE XEKE (Po, Leo zk) 
= Eltrl (Po. Leo, FK) XEXE KOP, Leo FKN 
= tr[E [Po 1€0, F (K) (Po, 160,7 K) Xr (k) X4] 
= t[E [Po Leo, FK) Po, Leo, r EIXE OXE K) (8B.10) 


where the last equality follows from the independence assumption. Furthermore, we note 
that 


Po.Leo,F (k) = FPo 1 F 'e, p(k) 
= Fé,(k) (8B.11) 


where èé (k) =[0 0 --- 0 e (kL) e,(KL+1) +++ e ,(kL+ L- 1)]" is the optimum 
output error vector in the extended form. Using Eq. (8B.11), we obtain 


ELP. Leo. F) (Po. Leo. F)"] = N'F EE (es (KF (8B.12) 


where we have noted that FĦ = N’F—! and @4(k) is replaced by €!(k) as č, (k) is 
assumed to be a real-valued vector. Assuming that the optimum error terms e,(kL), 
e (kL + 1),...,e (kL+ L-— 1) are samples of a white noise process with variance 
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E[e2(n)] and noting that E[e2(n)] = Emin: We get 


E[@,(k)@3(K)] = EminPo, L (8B.13) 


where Py; is defined as in Eq. (8.36). 
Substituting Eq. (8B.13) in Eq. (8B.12), we get 


ELCPo, Leo, FE) Po, Leo, F6] = N’ Emin Po, (8B.14) 
Using this result and the identity (6.23), we obtain 
tr[ELCPo, Leo, F(k)) (Po, Leo, F (K) EIX F (k) XZ) 
= N'Emintt[E [Po 1 Ær X} O] 
= N'E nint [EIX (Po, XFO] 
= NEnintt Re] 
= LN” Prs (Omin (8B.15) 
where the last equality follows from the identity 
[RH] = [FRF] 
= [F FR}, 
= [RY] = N’$,, (0) (8B.16) 


which is obtained from Eqs. (8.57) and (8.60). 
Substituting Eq. (8B.15) in Eq. (8B.10), and taking the result back to Eqs. (8B.6) 
through (8B.9), we obtain 


cee = UN,» (0) E nin (8B . 17) 


Substituting this result in Eq. (8B.1), we get the misadjustment for the unconstrained 
FBLMS algorithm without step-normalization as 


Meégims = LN’? (0) (8B.18) 


Unconstrained FBLMS algorithm with step-normalization 


Following the same line of derivations as in Section 8.3.2, for the present case, we obtain 


ve(k +1) = A — 2m, AX E(kKYPy  Xe(k) Ve) 
+ 2u, AT XZP, Leor (k) (8B.19) 


where A = E[¥;(k)X}(k)]. Postmultiply both sides of Eq. (8B.19) by their respective 
Hermitian transposes, take expectation, assume that eg p(k) is zero mean and independent 
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of 4-(k), and do some manipulations and approximation similar to what was done above, 
we obtain 


Kr(k + 1) © Ke(k) — 2U AIRY K ¢(k) — 2U KEk) R! AT! 
+ 4AT EIXE Po Leo. F Po. Leo pK) EX EIA (8B.20) 


where K(k) = E [vv O]. Since e, ¢(k) and 4-(k) are independent, we get using 
Eq. (8B.14) 


ELXZ KO (Po, Leo, FK) (Po Leo, F (k) XEK) 
= E[XE(kK)E[(Po,1€ 0,7) (Po, Leo, F (k) "IX (k)] 
= N'é nin EIXE O Po Ær k)] 
= N'E min Rix (8B.21) 


We note that A is a diagonal matrix consisting of the estimates of the powers of the input 
signal samples in the frequency domain. Considering the spectral separation property of 
the DFT (see e.g., Oppenheim and Schafer (1975) ), we obtain 


27X - Qn . 2x (N'—1) 
A ~ N' x diag (ox (em), Da a? ie ( N )) (8B.22) 


where ®,,(e/”) is the power spectral density of the underlying input process, x(n). 
The factor N’ in Eq. (8B.22) is the length of the DFT in the present case. Comparing 
Eq. (8B.22) with Eq. (8.61), we find that 


/ 


N 
xR! (8B.23) 
r Ro 


Substituting Eqs. (8B.21) and (8B.23) in Eq. (8B.20), we get 


Kz(k + 1) © K(k) — Apo Kr) + 4y 26 nin RE (8B.24) 
In the steady state, when K-(k + 1) = K(k), we obtain 
KEk) © Momin LRI! (8B.25) 
Substituting Eq. (8B.25) in Eq. (8B.7), we get 
excess © Momin (8B.26) 


Substituting this result in Eq. (8B.1), we obtain the misadjustment for the unconstrained 
FBLMS algorithm with step-normalization as 


JMEBLMS © Ho (8B.27) 
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Constrained FBLMS algorithm without step-normalization 


In this case, the FBLMS algorithm is an exact and fast implementation of the BLMS 
algorithm, that is, with a reduced computational complexity. Hence, the corresponding 
excess MSE is given by Eq. (8B.16). Noting that R is N-by-N and its diagonal elements 
are all equal to ¢,,.(0), Eq. (8A.16) may also be written as 


Eees = BNE minfa (0) (8B.28) 


to be in line with the rest of the results in this appendix. Substituting this result in 
Eq. (8B.1), we obtain the corresponding misadjustment as 


Mrsims © HN, (0) (8B.29) 


Constrained FBLMS algorithm with step-normalization 


Premultiplication of the gradient vector Vz(k)ez(k) by the matrix Py, implements 
the constraining step (8.47). Combining this step with step-normalization, we get the 
recursion 


ve(k +1) = 1-2, A | Py oX OPi L XF) E(k) 
+ 2A | Py oX k) Po. Leo (k) (8B.30) 


analogous to Eq. (8B.19). Following the same line of derivations as in the case of 
Eq. (8B.19), we obtain 


— 2M A Py (RE K Elk) — 2U KEKR” Py oA! 
alr AMN’ Enin A Py oR, Py oA | =0 (8B.31) 


We shall now solve this equation to find K(k). 
To proceed, let us define 
G=A'Py oR, (8B.32) 


and note that G! = Re Py gk as Py o is Hermitian and R¥, and A`! are diagonal 
matrices. Using these, Eq. (8B.31) may be rearranged as 


GK ¢(k) — UyN'Emin@ Py oA | + Ke(k)GH — uo N'Enin A (Py oGt=0 (8B.33) 
or 
G(Kr(k) — WN Emin Py oA) + Krk) — HoN’ Enn A | Py oG =0  (8B.34) 


General solution of Eq. (8B.37) turns out to be difficult. However, a trivial solution of 
that, which closely matches the simulation results, can be easily identified as 


Krk) = UN 'EminPw oA (8B.35) 
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Substituting Eq. (8.35) in Eq. (8B.7), we get 
1 
excess = Momin y UTP oA! Ric] (8B.36) 


Using Eq. (8B.23) in Eq. (8B.36), we obtain 


1 
a = ge min p "lP n.o] (8B.37) 
Noting that 
t[Py.ol = tlFPy oF] = tlF FPy o] = t[Py o] = N 


we get from Eq. (8B.37) 


N 
Excess = MoSmin y7 (8B.38) 
Substituting Eq. (8B.38) in Eq. (8B.1), we obtain the misadjustment for constrained 
FBLMS algorithm with step-normalization as 


N 
MÈBLMs = Mo N (8B.39) 
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Subband Adaptive Filters 


In the last two chapters, we have discussed two classes of LMS adaptive filtering algo- 
rithms that have improved convergence behavior compared to the conventional LMS 
algorithm. Convergence improvement in both classes was found to be a direct conse- 
quence of using orthogonal transforms for decomposing the filter input into a number of 
partially mutually exclusive bands. This was referred to as band-partitioning. Moreover, 
our study of transform domain adaptive filters in Chapter 7 clearly showed that the imper- 
fect separation of the input signal into mutually exclusive bands is the main reason for 
the suboptimal convergence behavior of such filters. 

In this chapter, we present another class of adaptive filters that also uses the concept of 
band-partitioning to improve the convergence behavior of LMS algorithm. This structure, 
which is called subband adaptive filter, is different from the transform domain adaptive 
filters in many ways. Firstly, the filters used for band-partitioning of the input signal are 
well-designed filters with high stop-band rejection, that is, very low side lobes. As a result, 
we find that the subband adaptive filters achieve higher degree of improvement in conver- 
gence compared to the transform domain adaptive filters of Chapter 7. Secondly, because 
of the high stop-band rejection, the subband signals can be decimated (down-sampled to a 
lower rate) before doing any filtering in subbands. Thirdly, implementation of subband fil- 
ters at a decimated rate results in significant reduction in the computational complexity of 
the overall filter. However, this reduction is not as significant as what is usually achieved 
by the fast block LMS (FBLMS) algorithm of Chapter 8. We will make some comments 
on comparison of the subband adaptive structure and FBLMS algorithm in Section 9.11. 

The subject of subband filtering is closely related to multirate signal processing. In a 
subband adaptive filter, the filter input is first partitioned into a set of subband signals 
through an analysis filter bank. These subband signals are then decimated to a lower 
rate and passed through a set of independent or partially independent adaptive filters that 
operate at the decimated rate. The outputs from these filters are subsequently combined 
together using a synthesis filter bank to reconstruct the fullband output of the overall 
filter. The DFT filter banks are commonly used for efficient realization of the analysis 
and synthesis filter banks. We thus start this chapter with a short review of the DFT 
filter banks and introduce the method of weighted overlap—add for efficient realization of 
these filter banks. We also discuss the conditions that should be imposed on the analysis 
and synthesis filters so that the reconstructed fullband signals have negligible distortion. 
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For a deeper study on multirate signal processing, the reader may refer to Crochiere and 
Rabiner (1983) or Vaidyanathan (1993), for example. 

Successful implementation of subband adaptive filters requires careful design of analysis 
and synthesis filters. Much of our effort in this chapter is thus devoted to the design of 
analysis and synthesis filters that are suitable for subband adaptive filtering. 


9.1 DFT Filter Banks 


Consider the case where a sequence, x(n), has to be separated into a number of subbands. 
For this, we may start with a lowpass filter, H(z), and proceed as follows. By passing 
x(n) through H(z), the low-frequency part of its spectrum is extracted. To extract any 
other part of the spectrum of x(n), say, the part centered around the frequency w = a,, 
we may shift the desired portion of the spectrum to the baseband (i.e., around w = 0) by 
multiplying x(n) with the complex sinusoid e~/®'”, and then use the lowpass filter H (z) 
to extract that. The filter H(z), which is repeatedly used for extraction of different parts 
of the input spectrum, is called the prototype filter. 

Using this method, a sequence, x(n), can be partitioned into any set of arbitrary bands. 
As the separated subband signals are in baseband and have a smaller bandwidth than the 
original fullband signal, they have a lower Nyquist rate and thus may be decimated (down- 
sampled) to a lower rate before any further processing. Figure 9.1 depicts the steps required 
for partitioning a sequence x(n) into M equally spaced subbands, centered at frequencies 


201i i 
w= —,; for i=0,1,...,.M-1 


| Ey—1(k) 


Lp» H(z) | |L 


Figure 9.1 DFT analysis filter bank. 
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and decimating the subband signals using a decimation factor L. The structure of 
Figure 9.1 is known as DFT analysis filter bank, for reasons that will become clear 
shortly. In Figure 9.1, decimation is denoted by a downward arrow followed by the 
decimation factor, L. We may also note that, in Figure 9.1, the time index n is used 
for the fullband input sequence x(n). In contrast, we use the time index k for subband 
sequences. These choices of time indices will be consistently followed throughout this 
chapter. Furthermore, the subband signals are represented by over-bar variables, such as 
X;(k)’s in Figure 9.1, so as to distinguish from fullband signals. 

We may note that in the structure of Figure 9.1, there is no restriction on the bandwidth 
of the prototype filter, H(z), the number of subbands, M, and the decimation factor L. 
Thus, there may be some overlap between different subbands. However, if L is chosen too 
large, the decimated subband signals may suffer from aliasing effects. Although aliasing 
is not desirable in most applications, we will later show that a small amount of aliasing 
may be beneficial in the implementation of subband adaptive filters. 

A general procedure for efficient realization of the DFT filter banks, for any choice of 
L and M, is the weighted overlap—add method. When M is a multiple of L, a slightly 
different procedure that leads to the so-called polyphase filter bank structure may be more 
useful from the point of view of computational complexity (Crochiere and Rabiner, 1983; 
Vaidyanathan, 1993). Since M is not necessarily a multiple of L in most applications of 
subband adaptive filters, we discuss only the weighted overlap—add method in the rest of 
this section. 


9.1.1 Weighted Overlap—Add Method for Realization of DFT Analysis 
Filter Banks 


To begin with, let us define 
Wy = ef C/M) 


where j = /—1. Then, the ith output of the DFT analysis filter bank may be expressed 


as! 


Co 
x(k) = So hy ¥(kL—n) (9.1) 
n=—00 
where , 
X;(n) = x(n)W,," (9.2) 
is the modulated version of the input, x(n) (Figure 9.1). Replacing n by —n, Eq. (9.1) 
may be rearranged as 
Co 
x(k) = Yo h_,¥ (kL +n) (9.3) 
n=—oOo 


Substituting Eq. (9.2) in Eq. (9.3), we get 
L = Wy JO hxk +n) Wy” (9.4) 


! We note that, in practice, the sequence h,, is always causal (i.e., h, = 0, for n < 0) and has a finite duration. 
However, here, we let n to vary from —oo to +00, to keep the derivations simple. 
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Now, the method of time aliasing may be applied to the summation on the right-hand side 
of Eq. (9.4) for its evaluation in an efficient manner. To this end, we define the sequence 


u,(n) = h_,x(kL +n) (9.5) 


—n 


and note that u,(n) is a windowed version of the input sequence, x(n), the window being 
the time reverse of the prototype filter, h,,. Using Eq. (9.5) in Eq. (9.4), we get 


Lk = Wy $O uW” (9.6) 


n=—CO 


With change of variable n = r + IM and noting that Wa = 1, Eq. (9.6) may be rear- 
ranged as 


M-1 
X(k) = Wy Do uk) Wy” (9.7) 
r=0 
where = 
ui(r)= ` u(r+iM), for r=0,1,...,M—1 (9.8) 
l=—co 


We note that the M-point sequence uj(r) is obtained by subdividing the sequence u;(n) 
into blocks of M samples and stacking and adding (i.e., time aliasing) these blocks. 
From Eq. (9.7), we note that the subband signal samples, x,;(k), fori = 0,1,..., M — 1, 
can be computed simultaneously, once the time-aliased sequence uj(r) is obtained. This 
is done by applying an M-point DFT to the samples u?(r), for r = 0, 1,..., M — 1, 
and multiplying the DFT outputs by the coefficients Wo as suggested in Eq. (9.7). 
Furthermore, computation of the DFT may be performed using an efficient FFT algorithm. 


9.1.2 Weighted Overlap—Add Method for Realization of DFT Synthesis 
Filter Banks 


Consider the case where the subband signals y,(k), for i = 0, 1,..., M — 1, are to be 
synthesized to reconstruct the fullband signal y(n). Also, assume that these subband 
signals are in baseband and at a decimated rate L times lower than the fullband rate. To 
generate y(n), we may proceed as follows: 


1. By appending L — 1 zeros after every sample of subband signals, these signals are 
expanded to the fullband rate. This is referred to as interpolation and, accordingly, 
L is called the interpolation factor. Interpolation results in a set of fullband signals 
whose spectra consist of L repetitions of their associated baseband spectra (see e.g., 
Oppenheim and Schafer (1989),). 

2. The repetitions of the baseband spectra are removed by the lowpass filter. 

3. The lowpass filtered fullband signals are then shifted to their respective bands through 
appropriate modulators. 
The combination of Steps | to 3 can be mathematically expressed as 

[0,0] 
y(n) = Wii Vike, for i=0,1,...,M—1 (9.9) 


k=—0o0 


298 Adaptive Filters 


where the sequence g,, is the impulse response of the lowpass filter, and the coefficients 
win are the modulating factors. The fact that the samples added in Step 1, to expand 
the subband signal sequences to fullband, are zero has been used to arrive at the special 
form of the summation on the right-hand side of Eq. (9.9). Verification of this is left 
to the reader as an exercise (Problem P9.1) 

4. Finally, the fullband signals y,;(m)’s are added together to obtain the synthesized 
sequence 


1 M-1 
y(n) = = 2 y;(n) (9.10) 


The factor 1/M, in Eq. (9.10), is added for convenience. 


Figure 9.2 presents the block diagram of a synthesis filter bank, where interpolation is 
denoted by an upward arrow followed by the interpolation factor, L. 

To obtain an efficient realization of synthesis filter banks, we proceed as follows. 
Substituting Eq. (9.9) in Eq. (9.10) and rearranging, we obtain 


lo) M-1 
1 = in 
yin) = Do bn a 3 zwi (9.11) 


k=—00 
Next, we define the following fullband sequence 


A 1 = = i(n-+kL) 
3, (n) = Baz 2 IEW (9.12) 


el (M1) 5pn 


sn ao fe 


Figure 9.2 DFT synthesis filter bank. 
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Then, using Eq. (9.12), we can write Eq. (9.11) as 


[0.0] 


ymn) = J. $n — kL) (9.13) 


k=—0o0 


That is, the output sequence y(n) is obtained by overlapping and adding the sequences 
y,(n)’s, thus the name overlap—add. 
Equation (9.12) may also be written as 


Jn) = By Yn) (9.14) 
where 


i es 
er) = DW Wi (9.15) 
i=0 


Note that y(n) is a periodic function of n with period M as win is periodic in n with 
period M, and the rest of the terms on the right-hand side of Eq. (9.15) are independent of 
n. Furthermore, it is straightforward to see that the values of y(n), for n = 0, 1,..., M — 
1 (.e., the first period of f, (n)) are samples of the inverse DFT of the sequence y; (kK) Wit, 
fori =0,1,...,M—1. 

From the above observation, we may adopt the following procedure to generate the 
samples of the synthesized output sequence, y(n): 


1. Upon the receipt of the latest samples of the subband signals, say y,(k), for i = 
0,1,..., M — 1, we construct the vector 


I‘) = DO FOWE aW -Sy WP] 


and compute the inverse DFT of y(k). 

2. The result of this inverse DFT is repeated to generate a periodic sequence. This 
makes the sequence y,(n) of Eq. (9.15). 

3. The sequence ĵ,(n) is obtained by multiplying the sequences ¥,(m) and g, on an 
element-by-element basis, as in Eq. (9.14). Assuming that g, is causal, (n) will also 
be causal. 

4. Finally, to generate the samples of y(n), the sequence ĵ,(n) is added to a buffer holding 
the accumulated results of the previous iterations, that is, sau y,(n — IL). The first 
L elements of the updated buffer are the samples y(kL), y(AL+ 1),..., y(kL+ L —1) 
of the synthesized output. While these samples are being sent to the output, the content 
of the buffer is shifted and filled with zeros from its other end and becomes ready for 
stacking the next set of samples, that is, },,, (7), in the next iteration. 


9.2 Complementary Filter Banks 


In multirate signal processing, in general, analysis and synthesis filters need to satisfy 
certain conditions in order that the reconstructed fullband signals have no or, at 
least, insignificant distortion. For this to be true in subband adaptive filters, we find 
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decimator decimator interpolator interpolator 
filter filter 


Figure 9.3 The ith channel of an M-band analysis—synthesis DFT filter bank. 


that the combined responses of the analysis and synthesis filters should be those of 
a complementary filter bank. 

To explain what do we mean by a complementary filter bank and also to derive the 
conditions required for a filter bank to be complementary, consider the ith channel 
(frequency band) of a pair of analysis—synthesis filter banks, as depicted in Figure 9.3. 
Figure 9.4 presents a set of plots showing the results of the various stages of Figure 9.3. 
Figure 9.4a shows a representative graph of the spectrum of the fullband input, 
x(n). The portion of the spectrum of x(n) that is centered around œ; = 27i/M is shifted 
to œ = 0 and lowpass filtered through the decimator filter, H(z). Let us choose œw; = 1/2 
and L = 4 for this example. Furthermore, let the lowpass filter H(z) be an ideal filter 
with unit gain over the frequency range —7 /4 < w < 2/4 and zero elsewhere. Then, the 
spectrum of the output of H(z) will be as shown in Figure 9.4b. The decimation, which 
compresses the output of H(z) along the time axis, results in expansion of the spectrum 
along the frequency axis, as shown in Figure 9.4c. The interpolator, in contrast, expands 
the signal samples along the time axis and thus results in compression of the spectrum, 
as shown in Figure 9.4d. This leads to L repetitions of the spectrum of the decimated 
signal over the range 0 < w < 2z. The interpolator filter, G(z), selects the baseband part 
of the repeated spectrum and rejects its repetitions, thereby recovering back the lowpass 
spectrum of Figure 9.4b. Finally, the output of G(z) is shifted to its respective band 
through a modulator. This results in a fullband signal x,;(”), which is a bandpass-filtered 
portion of the input, x(n), as shown in Figure 9.4e. 

From the above example, we also note that the effect of the decimator—interpolator 
blocks in Figure 9.3 is to repeat the baseband spectrum of Figure 9.4b, as shown in 
Figure 9.4d. However, as these repetitions are in turn rejected by the synthesis filter, we 
may delete these blocks from Figure 9.3, without affecting its input—output relationship. 
Furthermore, one can easily show that the combination of the modulator stages (i.e., 
multiplication of input, x(n), by e~/?7""/™ and the interpolator filter output by e/27'”/™) 
and the lowpass filters H(z) and G(z) is equivalent to the cascade of the bandpass filters 
H(ze/?7'/") and G(ze~/27'/™) (Problem P9.2). In an M-band analysis—synthesis filter 
bank, there are M such pairs of filters in parallel, as shown in Figure 9.5. For a sequence 
x(n) to pass through this bank of filters without distortion, the overall transfer function 
of the system should resemble that of a pure delay. That is 


M-1 
5 F(ze 7}? M) = 7-4 (9.16) 
i=0 
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Figure 9.4 Spectra of the signal sequences at various stages of Figure 9.3: (a) input signal, x(n), 
(b) decimator filter output, (c) decimator output, (d) interpolator output, and (e) final output, x; (n) 
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Figure 9.5 An equivalent block diagram of an M-band analysis—synthesis DFT filter bank. 
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Figure 9.6 A pictorial representation of the concept of complementary filter banks. 


where F(z) = H(z)G(z), and A is the delay introduced by the cascade of the analysis 
and synthesis filters. 

When Eq. (9.16) holds, we say that the filter bank is complementary. The comple- 
mentary condition (9.16) implies that x(n) = x(n — A). That is, the reconstructed signal, 
X(n), at the synthesis bank output is a delayed replica of the input, x(n). Figure 9.6 gives 
a pictorial representation of the concept of complementary filters, where the magnitude 
responses of the filters F(ze~/?7'/”) = H(ze7/?7'/M)G(ze-J?™'/™) of a four-band filter 
bank are plotted, for i = 0, 1, 2, and 3. As shown, there is some overlap among neigh- 
boring filters. However, the filters are chosen so that the overall response adds up to unity 
across the fullband. 

Figure 9.6 as well as Eq. (9.16) states the condition in frequency domain that should 
be satisfied for the filter bank to be complementary. In the design of complementary 
filter banks, however, we often find that it is more convenient to work with time domain 
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constraints. So, we convert the constraint specified by Eq. (9.16) to its equivalent in time 
domain. For this, we define the sequence f, as the inverse z-transform of F(z). That is, 


FQ= X fh” (9.17) 


n=—C 


Using Eq. (9.17) in Eq. (9.16) and rearranging, we obtain 


o0 M-1 . 
D (© ri) fa ee (9.18) 


n=—Oo i=0 


Furthermore, it is straightforward to show that 


e 7M = 


(9.19) 


3 Onin M, whennis a multiple of M 
0, otherwise 


i=0 


Using Eq. (9.19) in Eq. (9.18), we find that Eq. (9.18) can only be satisfied when A = KM, 
where K is a positive integer, and 


1/M, n= KM 
tS, = 49, n = all multiples of M except KM (9.20) 
unspecified, otherwise 


Thus, the value of K determines the total delay introduced by the filter bank. 


9.3 Subband Adaptive Filter Structures 


Figure 9.7 depicts the schematic of a commonly used structure of subband adaptive filters.” 
The adaptive filter is used to model a plant, W,(z). The input, x(n), and the plant output, 
d(n), are passed through a pair of identical analysis filter banks to be partitioned into M 
subbands and decimated to a rate that is 1/L of the fullband rate. The subband adaptive 
filters, W;(z)’s, are thus running at a rate that is only 1/L of the fullband rate. To generate 
the adaptive filter output in fullband, the outputs from the subband filters are combined 
together through a synthesis filter bank. 

The subband adaptive filter structure presented in Figure 9.7 is referred to as synthesis 
independent, as the adaptation of the subband filters is independent of the synthesis 
filters. The assumption here is that the synthesis filters are ideal, in the sense that their 
stop-band attenuation is infinity and their cascade with the analysis filters results in a 
complementary filter bank. In practice, these ideal requirements can be satisfied only 
approximately. Hence, the synthesis-independent subband adaptive filters are bound to 
have some distortion. This distortion can be reduced using an alternative structure, which 
is known as synthesis-dependent subband adaptive filter. This is shown in Figure 9.8. The 
delay A is to account for the combined delay due to analysis and synthesis filters. In this 
structure, even though the filtering is still done in subbands, the computation of output 


? The concept of subband adaptive filtering was first introduced by Furukawa (1984) and Kellermann (1984 and 
1985). 
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Figure 9.7 Subband adaptive filter (synthesis-independent structure). 


error, e(n), is done in fullband. The fullband error, e(n), is subsequently partitioned into 
subbands using an analysis filter bank, and the subband errors, e;(k)’s, are used for the 
adaptation of the associated subband filters. 

The synthesis-dependent structure, although resolves the distortion introduced by the 
synthesis filters, has some drawbacks, which hinder its application in practice (Sondhi 
and Kellermann, 1992). In particular, the cascade of synthesis and analysis filter banks 
in the adaptation loop introduces an undesirable delay that makes the filter more prone 
to instability. Furthermore, the presence of a delay in the adaptation loop increases 
the memory requirement of the filter (Problem P9.3). Because of these problems, the 
synthesis-dependent subband adaptive filter structure has been less popular than its 
synthesis-independent counterpart. Noting this, our emphasis in the rest of this chapter 
will be on the synthesis-independent structure. Nevertheless, most of the results we 
develop are applicable to the synthesis-dependent structure as well. 


9.4 Selection of Analysis and Synthesis Filters 


Design of analysis and synthesis filters with well-behaved responses is crucial to suc- 
cessful implementation of subband adaptive filters. In this section, we look into the basic 
requirements of the analysis and synthesis filters. We note that there are many requirements 
that should be taken into account while selecting these filters and hence a compromise has 
to be struck to achieve an acceptable design. As a result, it is very difficult to give any 
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Figure 9.8 Subband adaptive filter (synthesis-dependent structure). 


specific criterion whose optimization will lead to the optimum set of filters. Instead, we 
find it more appropriate to deal with this problem in a subjective manner, which would 
lead us to a number of specifications for a good compromise design. 

As was noted earlier in Section 9.2, for the reconstructed output of a subband adaptive 
structure to have small distortion, the analysis and synthesis filters should form a comple- 
mentary filter bank. There are many pairs of analysis and synthesis filter banks that satisfy 
the complementary condition. This provides some degrees of freedom, which may be used 
to facilitate the design and/or enhance the performance of the subband adaptive filters. 

A first attempt may be to use the same prototype filter for both analysis and synthesis. 
Unfortunately, this leads to subband signals whose spectra vary and decay to some small 
values near the ends of their respective bands. This, in turn, will result in the inputs to the 
subband adaptive filters to be badly conditioned because of the low excitation levels near 
the band edges. Furthermore, from our discussions in the previous chapters, we know 
that such inputs will result in large eigenvalue spreads and thus poor convergence. This 
problem may be resolved as follows (Morgan (1995) and De Leon and Etter (1995)): 
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Figure 9.9 A possible choice of analysis and synthesis prototype filters that resolves the problem 
of slow convergence of subband adaptive filters. 


Figure 9.9 presents a diagram showing a good choice of analysis and synthesis pro- 
totype filters that resolves the problem of slow convergence of subband adaptive filters. 
The analysis prototype filter is chosen such that it has a flat magnitude response and 
linear phase response (constant group-delay*) between zero and a frequency larger than 
or equal to w,,, where w,, is the beginning of the stop-band (i.e., the end of the transition 
band) of the synthesis prototype filter. Moreover, the synthesis filters are chosen to be 
complementary. The cascade of the analysis and synthesis filters will then be a comple- 
mentary filter bank because in this case the multiplication (cascade) of the analysis and 
synthesis prototype filters is just the same as the synthesis prototype filter. The analysis 
filters introduce only a fixed delay in the overall response of the subband structure. 

Next, we explain why the choice of the analysis and synthesis prototype filters, as 
shown in Figure 9.9, resolves the problem of poor convergence of subband adaptive 
filters. Assuming that the power spectral density of the fullband input, x(n), does not vary 
significantly over each subband, using an analysis prototype filter similar to the one shown 
in Figure 9.9, would result in all decimated subband sequences to have approximately flat 
spectra over the range of frequencies || < w,,. On the other hand, the band of interest 
over which matching between the frequency response of each subband adaptive filter and 
its associated desired response from the respective band of the plant should be achieved 
is |w| < @,,, as frequencies beyond this are cut off by the synthesis filters (Figure 9.9). 
We may thus say that in a subband adaptive filter structure whose analysis and synthesis 
prototype filters are selected as shown in Figure 9.9, all the subband filters will be well 
excited over their respective bands of interest, and hence there will not be any slow mode, 
which may affect the convergence behavior of the overall filter. 

Another consideration that should be noted in the implementation of subband adaptive 
filters, and hence in the design of analysis and synthesis filters, is the problem of delay (or 
latency) in the filter output, y(n). This delay is caused by the analysis and synthesis filters. 
Minimization of this delay is exceedingly important as the maximum delay permitted in 
many applications is often very much limited. For instance, in the application of acoustic 


3 Group-delay of a system is defined as the derivative of its phase response with respect to the angular frequency, w. 


Subband Adaptive Filters 307 


echo cancellation (AEC), around which most of the theory of subband adaptive filters 
have been developed, the maximum delay allowed is usually very minimal.* The factors 
that influence the delay introduced by the analysis and synthesis filters are the number 
of subbands, M, the decimation factor, L, the accuracy of analysis and synthesis filters 
(which may be defined in terms of their stop-band attenuation and pass-band ripple), and 
also the criterion used in designing analysis and synthesis filters. The last two issues are 
addressed in Section 9.7, where a method for designing analysis and synthesis filters with 
small delay is given. 

The delay increases with number of subbands, M. On the other hand, we may recall 
from our previous discussion that the idea of subband adaptive filtering is to partition the 
input signal into a number of narrow bands such that the signal spectrum is approximately 
flat over each band, thus giving an implementation that does not suffer because of large 
eigenvalue spreads. Hence, from convergence point of view, larger values of M are 
preferred. The choice of the decimation factor, L, also affects the selection of the analysis 
and synthesis filters and hence the delay. In general, the delay increases with L, as well. 
On the other hand, the computational complexity of a subband adaptive structure decreases 
as L increases. Thus, a compromise has to be struck while choosing L and M. 

From the above discussion, we find that the selection of the analysis and synthesis 
filters is not a straightforward or a clearly formulated problem. On the one hand, we 
should make sure that the delay introduced by the analysis and synthesis filters does not 
exceed a specified value. This is usually specified as one of the design requirements. On 
the other hand, we may choose the number of subbands, M, and the decimation factor, 
L, as large as possible, while designing analysis and synthesis filters with some (loosely 
defined) acceptable aspects. A procedure for the design of analysis and synthesis filters 
as well as for the selection of values of L and M and the other parameters of the subband 
adaptive filters is given in Sections 9.7 and 9.8. 


9.5 Computational Complexity 


Computational complexity of subband adaptive filters, in general, decreases as the dec- 
imation factor, L, increases. To explore the impact of L in reducing the computational 
complexity of a subband adaptive filter, let us consider the implementation of an adaptive 
filter whose fullband implementation requires N taps. We also assume that the filter input, 
x(n), and the desired signal, d(n), are real-valued. The number of taps required for each 
subband filter is then N/L, because in subbands, each sample interval is equivalent to L 
sample intervals in the fullband. We also note that although the input, x (7), is real-valued, 
the subband signals are, in general, complex-valued. They also appear in complex- 
conjugate pairs, with the exceptions of bands 0 and M/2 (we assume that M is even), 
whose corresponding inputs are real-valued when the input, x(n), is real-valued. Consider- 
ing these two bands as one band with complex-valued input, the computational complexity 
of a subband adaptive filter with N real-valued fullband taps may be evaluated based on 
(M/2)(N/L) = MN /2L complex-valued subband taps. Furthermore, we note that pro- 
cessing in subbands is done at a rate that is only 1/L of the fullband rate. Noting these 


4In the ITU-T standard G.167, it is stated that “for end-to-end digital communications (e.g., wide-band telecon- 
ference systems), the delay shall be no more than 16 ms in each direction of speech transmission.” 
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and counting each complex-valued tap as equivalent to four real-valued taps, we obtain 


Complexity of subband filter 2M 


= 9.21 
Complexity of fullband filter Z? ee) 


This result does not include the complexity of the analysis and synthesis filters. However, 
in applications where subband adaptive filters are found useful, the filter length, N, is 
usually very large, in the range of 1000 or above. For such values of N, the contribution 
from the complexity of analysis and synthesis filters is not that significant (usually in 
the range of 20% or less). 
In typical designs, one of which is given in Section 9.9, we usually find that L ~ M/2. 
Thus, we obtain 
Complexity of subband filter a 4 x 8 


- a o R (9.22) 
Complexity of fullband filter LM 


9.6 Decimation Factor and Aliasing 


With the choice of the analysis and synthesis filters as shown in Figure 9.9, the largest 
value of L that may be used without causing aliasing of signal spectra over the bands of 
interest, that is, those selected by the synthesis filters, is given by 


27 
| |_| (9.23) 


Ws. + Osa 


where |x] denotes the largest integer smaller than or equal to x; @ and @,,, as 
indicated in Figure 9.9, denote the ends of the transition bands of the analysis and 
synthesis prototype filters, respectively. This choice of L will result in some aliasing in 
the outputs of analysis filters. However, the aliased portions of the spectra are those that 
will be filtered out by the synthesis filters, and hence will not affect the fullband output 
of the filter. Figure 9.10 illustrates this, where we have plotted the magnitude responses 


of the analysis and synthesis filters after decimation. The selection of L = L,,,,, 
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Figure 9.10 Magnitude responses of the analysis and synthesis filters after decimation, illustrating 
the fact that even though the decimated output samples of the analysis filters are aliased, the portion 
of the signal spectrum that is filtered by the synthesis filter is free of aliasing. 
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although may seem quite reasonable at the first glance, has some drawbacks when it 
comes to adaptation of the subband filters. It results in significant augmentation of the 
misadjustment, as explained next. 

The Fourier transform of the desired signal, d;(n), in the ith subband is given by 


i+] 
@—2nm ; o- Pam 


D; (e}®) = 5 CDAC: C ) (9.24) 


m=i-1 


where X (e/”) is the Fourier transform of the input, x(n), and W, (ef?) and H (e/®) are the 
frequency responses of the plant and the analysis prototype filter, respectively. The three 
terms contributing to the spectrum of d, (n) are (i) the ith band spectrum, m = i, (ii) the 
aliased spectrum from the immediately following band, m = i + 1, and (iii) the aliased 
spectrum from the immediately preceding band, m = i — 1. The division of the frequency, 
w, by L is due to the spectral expansion because of the L-fold decimation. Similarly, the 
Fourier transform of the output, y; (k), of the ith subband filter is obtained as 


i+1 
¥,(e/”) = W,(e/”) > X(e/ 


m=i-—1 


o- 2am : @—2rm 


)H (e 


) (9.25) 


Using Eqs. (9.24) and (9.25), the Fourier transform of the subband error sequence 
é;(k) = d;(k) — y; (k) is obtained as 


E; (e?) = D; (et?) — = (e”) 


=|w, CET =) = W, ce”) | H (ei @-270/2) X ( j 22 


e (e it an +1 _ W,(e!”)| Hei )X(e it 4 
+[w, (e it an G— 2) _ W, e!”)| H(ei a) x (el —I— 4 (9.26) 


Inspection of Eq. (9.26) reveals that to minimize 
2 l gm jw \2 
Elle;(k)|"] = — |E; (e?) do 
20 Jn 


W, (e/”) has to be selected so that the three differences in the brackets on the right-hand 
side of Eq. (9.26) reduce to some small values. Moreover, we note that the frequency 
interval —z < w < x may be divided into three distinct rans The first range, derned as 
—@, < w < %4, is where there is no overlap between H (e 
H(e/?- 2a(i— D/L) and H(e/@-27+D/L) In this range, the w two terms on the right- hand 
side of Eq. (9. 26) are zero as this range coincides with the stop-bands of H (e/°~ ery hy 
and H (ei TT E ). The first term can also be made small by choosing W, (e/”) to be 
close to the plant response, W, (eit), for —w; < w < w. It is important to note that 
a selection of L < Lmax implies T @,, < @,. This in turn means that the portions of 
the plant response that are picked up by ‘the synthesis filters can be modeled well by the 
subband adaptive filters. 
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The second range, w < œw < 7, is where t the filters attached to the first two terms on the 


right-hand side of Eq. (9.26), that is, H (C =a ), overlap. In this range, 


for |E; (e/”)| to reduce to a small value, r (e/”) has to match with two different parts 
-w—2n(i+l 
= yy, for œ < æ < x. This, of 


o- o-Mli+tD 


y weir 25, in general. Similarly, in the third 
range also, where —z < w < —@, it may not be possible to reduce |E;(e/“)| to a small 
value. As a result of these mismatches, E[|e; (k)|7] may be very significant, even after the 
convergence of the subband adaptive filter. This will result in large perturbation of the tap 
weights because of the use of stochastic gradients, thereby increasing the misadjustment 
of these filters, unless a very small step-size is used to reduce the level of perturbations. 
But, reducing the step-size is undesirable, as it proportionately reduces the convergence 
rate of the adaptive filter. Another solution that has been proposed to solve this problem 
is to add cross-filters between the neighboring subbands (Gilloire and Vetterli, 1992). 
However, this increases the system complexity, and hence not acceptable because the 
main goal of increasing L was to reduce the complexity. Yet another solution, which is 
found to be more appropriate than the others, is to select the decimation factor, L, so 
that the overlapping of the adjacent analysis filters is limited only to those portions of 
the analysis filter responses that are below a certain level (Farhang-Boroujeny and Wang, 
1997). However, no fixed value may be specified for this “level.” It is a loosely defined 
design parameter that can only be found experimentally. Thus, a compromise value of L 
could only be selected through a trial-and-error design process (Section 9.8). 


9.7 Low-Delay Analysis and Synthesis Filter Banks 


In this section, we present a method for designing analysis and synthesis filters with low 
group-delay.® As was noted before, design of low-delay filters is desirable in subband 
adaptive filters as it reduces the latency of the overall filter response. 


9.7.1 Design Method 


From our discussion in Section 9.4, we recall that the analysis and synthesis filters should 
have good attenuation in their stop-bands. The problem of designing an optimum FIR 
filter with maximum attenuation in the stop-band may be formulated as follows. 
Consider an FIR filter with length N, and tap weights given by the real-valued 
coefficient-vector 
a= [ao a+: ‘ay, 1l" 


where the superscript T denotes the transposition. Then, the transfer function of the FIR 
filter is given by 
A(e/®) = aQ (9.27) 


>It shall be noted that the purpose behind the use of cross-filters by Gilloire and Vetterli (1992) was to resolve 
the problem of perfect reconstruction, which is different from our aim here. Nevertheless, the concepts discussed 
there, with minor modifications, may also be applied to suit the implementation presented in this chapter. 

6 The design method presented here is from Farhang-Boroujeny and Wang (1997). It follows the idea of Mueller 
(1973) who used the same method for designing Nyquist filters for data transmission purpose. Vaidyanathan and 
Nguyen (1987) have also proposed a similar method (with some extensions) and called the resulting designs 
eigenfilters. However, neither Mueller nor Vaidyanathan and Nguyen emphasized on low-delay filters 
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where | | | 
Q=[1 ei? eI ... gi Na-DoyT 


Suppose we want this filter to have its stop-band to begin from @,. Then, the total energy 
in the stop-band is given by 


1 2T — ws , 
a f |A (e}®) dw (9.28) 
2T J 
Substituting Eq. (9.27) in Eq. (9.28), we obtain 
E, =a" ða (9.29) 
where 
1 2m — ws 
= — / 22" de (9.30) 
20 Sux 


and the superscript H denotes the Hermitian transposition. We note that ® is an N,-by-N, 
matrix whose k/th element is 


i fe 1-2 k=l 
bu = = | oda = 1 sa ed] (9.31) 
on J, —sinlestD] k] 


The optimum coefficients of the FIR filter are those that minimize the energy function 
E, of Eq. (9.29). To prevent the trivial solution a; = 0, for i=0,1,...,N,—1, we 
impose the constraint aTa = 1. The problem of minimizing E, with respect to the vector 
a, subject to the constraint aTa = 1, is a standard eigenproblem whose optimum solution 
is the eigenvector of ®, which corresponds to its minimum eigenvalue; see Property 7 of 
eigenvalues and eigenvectors in Chapter 4. 

We recall that the synthesis filters have to be complementary. Moreover, as we shall 
see later, the complementary filters are also appropriate for use as analysis filters. To 
adopt the above procedure for designing the complementary filters, we recall the M- 
band complementary condition (9.20). This condition is repeated below in terms of the 
coefficients a;, fori = 0,1,..., Na — 1, of the filter A(e/®): 


1/M, i = KM 
a; = 40, i = all multiples of M except KM (9.32) 
unspecified, otherwise 


where K is a constant integer that determines the group-delay of the filter bank, as 
discussed in Section 9.2. 

To satisfy the conditions stated in Eq. (9.32), we may simply drop those a;’s that have 
to be zero from Eq. (9.29). The new energy function to be minimized is then 


E, = a' oa, subject to the constrainta’a = 1 (9.33) 


where a is obtained from a by deleting those elements that should be constrained to zero, 
and ® is obtained from ® by deleting the corresponding rows and columns so to be made 
compatible with a. The minimization of Æ, also is an eigenproblem. Its solution is the 
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eigenvector of ® that corresponds to its minimum eigenvalue. The desired vector a is 
obtained from ® by inserting the dropped out zeros in the appropriate locations. Finally, 
to satisfy the condition agy = 1/M of Eq. (9.32), a simple scaling is applied to a. 

One may note that the above procedure does not specify any specific range of fre- 
quencies for the pass-band and transition band. Only the stop-band is specified. To be 
more accurate on this, we recall that in an M-band complementary filter bank, the fre- 
quency w = x /M is located at the middle of the transition band of its prototype filter; see 
Figure 9.6 as an example. The stop-band of the prototype filter begins at (1 + a)z/M, 
where a, known as roll-off factor, determines the widths of the pass-band and transi- 
tion band. The pass-band of the prototype filter is given as 0 < œ < (1 — œ)x/M and 
the transition band as (1 — a)z/M < w < (1+a)z/M. The numerical examples given 
next and the supporting discussions show that, for the filters designed by the proposed 
method, the pass-, stop-, and transition bands will be clearly separated according to the 
above boundaries, once œ, is set equal to (1 + @)z/M. 


9.7.2 Filters Properties 


In this subsection, we look at the main features of the filters designed by the method 
presented above. We recall that an M-band complementary filter bank with the prototype 
filter A(e’) and the parameter K as specified in Eq. (9.32) satisfies the identity 


M-1 
5 A (ei @-27i/M)) = e joKM (9.34) 
i=0 


On the other hand, the design procedure given in the last subsection emphasizes only 
on the stop-band of the prototype filter A(e/). But, there is no clear emphasis on how 
the pass-band and the transition band of A(e/®) are separated. Next, we show that the 
boundary between these two bands can be easily identified once the design parameters 
M and œ, are known. 

From our discussion in Section 9.2, we recall that the midpoint of the transition band 
of the prototype filter of an M-band complementary filter bank is œ = 2/M. Moreover, 
from the pictorial representation of Figure 9.6, it is straightforward to conclude that if 
@, and œ, are, respectively, the end of pass-band and the beginning of stop-band of the 
prototype filter of an M-band filter bank, then 


T Wp F Os 


v = 5 (9.35) 
Hence, when M and œ, are given, Wp is obtained as 
2 
Op = =~ 0, (9.36) 


In the discussion that follows, it is convenient to specify w, in terms of the midpoint 
frequency z/M as 


JU: 
0 = (+a) (9.37) 
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where œ is a positive parameter that specifies the width of the transition band of the 
prototype filter of the filter bank as explained below. The parameter a, as noted above, 
is known as roll-off factor. 

Substituting Eq. (9.37) in Eq. (9.36), we get 


wy = (1- a) (9.38) 
Also, the width of the transition band of the prototype filter is obtained as 
27a 


Equation (9.34) explicitly states that the group-delay introduced by the filter bank is 
KM. In most subband adaptive filtering applications, we want to keep this delay as small 
as possible. On the other hand, the optimum K that results in maximum attenuation in the 
stop-band is obtained by choosing K so that KM is the nearest multiple of M to N,/2. 
However, this delay is generally large and thus, we would instead strike a compromise 
between delay and stop-band attenuation. That is, we may accept a lower delay at the 
cost of lower stop-band attenuation. 

For effective implementation of subband adaptive filters, it is important to understand 
the effect of reduced delay on the performance of the analysis and synthesis filters and 
its overall impact on the performance of the adaptive filter. This can be best understood 
through an example. 

Figure 9.11 shows the magnitude and group-delay responses of three filters that have 
been designed by the above method. The filter length, N,, the number of subbands, M, 
and the roll-off factor, œ, are set equal to 97, 4, and 0.25, respectively, and the three 
designs are differentiated by the parameter K. The separation of the pass-band, transition 
band, and stop-band can be clearly seen in the responses. In particular, we note that the 
transition bands in the three designs are the same and match the band edges predicted by 
Eq. (9.37) and Eq. (9.38). We also note that the price to be paid for achieving reduced 
delay is lower stop-band attenuation, an undesirable boost in the magnitude response in 
the transition band, and group-delay distortion in the transition and stop bands. However, 
the magnitude and group-delay responses in the pass-band remain nearly undistorted. This 
is a desirable feature of this design method that makes it very appropriate for designing 
analysis as well as synthesis filters in the application of subband adaptive filtering. 


9.8 A Design Procedure for Subband Adaptive Filters 


Since there are many compromises to be made in the overall design of subband adaptive 
filters, it is very hard to suggest a simple design procedure for such filters. In this section, 
we present a procedure that the author has found useful in his research work. This proce- 
dure is iterative in nature and its application requires some experience. Hence, a novice 
needs to do some experiments with that before he/she can use it for actual design. 

To choose all the parameters necessary for setting up a subband adaptive filter, one 
may take the following steps: 


1. Choose a value for the number of subbands, M. 
2 


2. Choose an integer parameter, J, in the range of 5 to $ of M, and select the pass-band, 


transition band, and stop-band of the analysis and synthesis prototype filters, as shown 
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Figure 9.11 Magnitude and group-delay responses of three filters that have been designed by the 
method in Farhang-Boroujeny and Wang (1997). 
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in Figure 9.12. This determines the values of the roll-off factors œ, and a, of the 
analysis and synthesis filters, respectively. Note that the midpoints of the transition 
bands of the analysis and synthesis prototype filters are 2/J and 2/M, respectively. 

We note that when the method of Section 9.7 is used to design the analysis and 
synthesis filters, these choices of the midpoints of the transition bands lead to analysis 
filters, which are J-band complementary, and synthesis filters, which are M-band 
complementary. Furthermore, the positions of these midpoints determine the range of 
the roll-off factors œ, and a, of the analysis and synthesis filters, respectively. The 
range of possible values that œ, and œ, may take can easily be worked out by inspection 
from Figure 9.12, and noting that the pass-band and transition band of the synthesis 
filter should be covered by the pass-band of the analysis filter (see also Problem P9.3). 

3. Choose values for the lengths of analysis and synthesis filters. Call these N, and N,, 
respectively. Also, select values for the parameters K of the analysis and synthesis 
filters. Call these K, and K,, respectively. 

4. Using the parameters selected above, design the analysis and synthesis prototype filters 
by following the method presented in Section 9.7.1. 

5. Evaluate the stop-band rejection of the prototype filters. If satisfactory, proceed with 
the next step. Otherwise, reselect one or a few of the parameters N,, Ns, Kas Ks, J, 
a,, and a, and redesign the prototype filters until the design is satisfactory. 

6. Select a value of L < J and evaluate the aliasing of the decimated subband signals. A 
limited amount of aliasing may be allowed. However, because of the reason discussed 
in Section 9.6, such aliasing should be relatively small. 

7. Evaluate the design by putting the designed analysis and synthesis filters in the subband 
adaptive filter structure and running a typical simulation of some application. If the 
performance is not satisfactory, then the filters need to be redesigned for other choices 
of the parameters listed above. 


While putting the designed analysis and synthesis filters in a subband structure, we 
should note that the analysis filters are J-band complementary with parameter K = K,, 
while the synthesis filters are M-band complementary with parameter K = K,. As a result, 
the group-delay introduced by the analysis filters is K,J and that from the synthesis filters 
is K,M. Then, the net group-delay due to a direct cascade of the two filter banks would be 
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Figure 9.12 Definitions of the band edges in the analysis and synthesis prototype filters. 
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K,J + K,M. On the other hand, according to our discussion in Section 9.2, the cascade 
of the analysis and synthesis filters must be M-band complementary so that the total 
group-delay is an integer multiple of M. This may be achieved either by selecting K, and 
J so that K,J + K,M is an integer multiple of M, or by padding appropriate number of 
zero coefficients at the beginning of the analysis and/or synthesis filters so that the total 
group-delay is made an integer multiple of M. Accordingly, the following equation may 
be used to calculate the delay, A: 


A = the first integer multiple of M, 
which is greater than or equal to K,J + K,M (9.40) 


9.9 An Example 


In this section, we discuss two design examples to demonstrate the effectiveness of the 
design technique that was introduced in the last two sections.’ The aim is to see how far 
we can go in reducing the delay and the price that we pay for it. 

The following common parameters are used in both the designs: 

M = 32 19 : d u 
=232,. = 19; %a = ag an %s = 9 
In the first design, we ignore the problem of delay and design the analysis and synthe- 
sis filters that result in maximum attenuation in their stop-bands. As was noted before, 
maximum stop-band attenuation is achieved when the delay introduced by each filter is 
about half of its respective length. We refer to this as the conventional-delay design. In 
the second design, a few attempts are made to obtain a pair of low-delay analysis and 
synthesis filters with stop-band attenuations comparable to those in the first design. This 
is called the low-delay design. To achieve similar stop-band attenuations with reduced 
delay, the lengths of the filters need to be chosen longer than their conventional-delay 
counterparts. 

The two designs are summarized in Table 9.1. The value of the delay, A, for each 
design is calculated according to Eq. (9.40). Note that the low-delay design achieves a 
delay that is half of that of the conventional-delay design. This, as expected, is at the cost 
of increased filter lengths, N, and N,. In Table 9.1, £, and E, are the stop-band energies 
of the analysis and synthesis filters, respectively. 

To illustrate the effect of decimation factor, L, on the overall performance of the filter, 
the designed low-delay analysis and synthesis filters are put into a subband structure that 
is used for modeling a 1600-tap plant. The plant response is that of an acoustical echo 
path of a normal size office room (Section 9.10). The plant input is assumed to be a 
white Gaussian noise. We also add some noise to the plant output. The normalized LMS 
(NLMS) algorithm of Section 6.6 is used for adaptation of the subband filters — also see 
Section 9.10. Figure 9.13 shows the learning curves of the subband adaptive filter for 
values of L = 16 to 19. L = 16 corresponds to the case where the decimated subband 


7 The design examples presented here and their application in the study of acoustic echo canceler is taken from 
Wang (1996). 
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Table 9.1 Summary of the two designs of 
analysis—synthesis prototype filters. 


Parameters Conventional delay Low delay 
K, 5 3 
K, 3 1 
N, 191 289 
N, 193 353 
A 192 96 
E, 1.4 x 107° 5.0 x 1077 
E, 4.1 x 1077 4.6 x 1077 
-L=16 
— L=17 | 
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Figure 9.13 Learning curves of the subband adaptive filter for different values of the decimation 


factor, L. 


signals do not suffer from any aliasing. On the contrary, L = 19 corresponds to the case 
where the decimated subband signals are fully aliased over the transition bands of their 
respective analysis filters. However, the signals in their pass-bands do not suffer from any 
serious aliasing, except that due to nonideal stop-band attenuations that are negligible. 
The case L = 17 does suffer from aliasing in transition bands, although relatively low. 
These results clearly confirm our earlier conjecture that in the selection of the decimation 
factor, L, a small amount of aliasing is acceptable. 
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9.10 Comparison with FBLMS Algorithm 


Adaptive filtering in subbands has many similarities with the FBLMS algorithm® which 
was introduced in Chapter 8. Firstly, both these methods may be categorized under the 
class of block/parallel processing algorithms. As a result, both of these methods offer 
fast implementations of adaptive filters, that is, implementations with reduced complexity 
as compared to the non-block methods. Furthermore, they resolve the problem of slow 
convergence of the LMS algorithm. These advantages are obtained at the cost of certain 
processing delay at the filter output. Hence, at this point, it seems appropriate and essential 
to make some comments on the relative performance of the method of subband adaptive 
filtering and the FBLMS algorithm in terms of convergence behavior, computational 
complexity, and processing delay. However, a quantitative comparison of the two methods 
is not straightforward. Thus, in the rest of this section, we make an attempt to give some 
general comments on the above issues, leaving the discussion an open-ended one so that 
the reader can complete it by closely examining his/her specific application of interest. 

We note that adaptation of each of the subband filters can be performed using any of 
the adaptive filtering algorithms that has been introduced so far or that will be introduced 
in the subsequent chapters. In the discussion that follows, for convenience, we assume 
that the NLMS algorithm is used for this. We thus use the term subband NLMS algorithm 
to refer to this implementation of subband adaptive structure. 

Simulations and experiments show that both the subband NLMS and FBLMS algorithms 
are quite successful in decorrelating the samples of the filter input. By careful selection 
of their parameters, both these algorithms can be tuned to offer learning curves that are 
dominantly governed by a single mode of convergence. 

Comparison of the two algorithms with respect to their computational complexity is 
also not straightforward. For a pair of designs with comparable convergence behavior 
and processing delay, one may use the number of operations per sample for comparing 
the computational complexities of the two algorithms. This, although often used in the 
literature for comparing different algorithms, does not seem to be fair in the present case 
because of the many structural differences between the two algorithms. For instance, the 
subband NLMS algorithm has a more regular structure than the FBLMS algorithm. On 
the other hand, in typical applications of interest, say adaptive filters with at least a few 
hundred fullband taps, we usually find that the FBLMS algorithm has lower operation 
count than the subband NLMS algorithm. Thus, to a great extent, the choice between 
the two algorithms depends on the available hardware/software platform. In software 
implementation on digital signal processors, the FBLMS algorithm is usually found to be 
more efficient than the subband NLMS algorithm. In contrast, the more regular structure 
of subband NLMS algorithm may make it a better choice in a custom chip design. 
In particular, we may note that the subband filter structure can easily be divided into a 
number of separate blocks. For instance, each of the analysis/synthesis filter banks and the 
subband filters of Figure 9.7 may be treated as a separate block in a multiprocessor chip. 


8 In this section, we use the term FBLMS algorithm in a general sense. It includes the FBLMS as well as partitioned 
FBLMS algorithms of the last chapter. 
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Delay is an adjustable parameter in the subband structure as well as FBLMS algo- 
rithm. In general, by allowing larger delay (up to certain limit), the complexities of both 
the methods can be reduced. The choice of the delay is usually limited by the system 


specification. 


Problems 


P9.1 


P9.2 


Show that if a sequence y,(k) is interpolated using an interpolation factor of L 
and passed through a filter with the impulse response g,,, the resulting output may 


be written as 
CO 


yin) = JO Ogna 


k=—0o 


Show that the following pairs of structures are equivalent: 


(i) Case I: 


and 


Give Pm) 


(ii) Case II: 
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P9.3 


P9.4 


P9.5 
P9.6 


P9.7 


P9.8 
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Consider a transversal adaptive filter with tap-input and tap-weight vectors x(n) 
and w(n), respectively. To adapt w(n), we wish to use the delayed LMS recursion 


w(n + 1) = w(n) + 2ue(n — A)x(n — A) 


where A is a constant delay, e(n) = d(n) — w'(n)x(n) is the output error, and 
d(n) is the desired signal. Study the hardware/software implementation of this 
algorithm and discuss how the hardware/memory requirements of the filter vary 
with A. 

Refer to the synthesis-dependent structure given in Figure 9.8. Note that there is 
some delay introduced by the synthesis—analysis filters in the path from subband 
filters outputs y;(k)’s to the subband errors e;(k)’s. Discuss how this delay leads 
to a set of delayed LMS recursions for adaptation of the subband filters W;(z)’s, 
and how this affects the hardware/memory requirements of the subband structure. 


Consider an M-band subband structure with parameter J as defined in Section 9.8. 
Show that the condition necessary for the pass-bands and transition bands of 
synthesis filters to be covered by the pass-bands of analysis filters is the following: 


Jæ, + Ma, M-J 


Explore the validity of Eq. (9.23) in detail. 


Equations (9.21) and (9.22) are given for the subband adaptive filters with real- 
valued input. Derive similar equations for the case where the filter input and 
desired signal are complex-valued. 


In a pair of complementary analysis—synthesis filter banks, for each of the fol- 
lowing set of parameters, determine the number of zeros needed to be added in 
front of either the analysis or synthesis filters such that their combination is a 
complementary M-band filter bank: 


G) M=4, J =3, K, =7, K,=5 
(ii) M = 64, J = 48, K, =4, K, =3. 


Recall that in the realization of DFT analysis filters using weighted overlap—add 
method, at the last stage, we need to multiply the DFT outputs by the coefficients 
Wis fork = 0,1,..., M — 1 (Eq. 9.7). On the other hand, in the realization of 
DFT synthesis filters using weighted overlap—add method, the subband signals 
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should be multiplied by the coefficients wie. for k =0,1,..., M — 1, before 
the application of the DFT (Eq. 9.15). Carefully examine the structure of sub- 
band adaptive filter and show that these two operations may be deleted from the 
structure of a subband adaptive filter without affecting its performance. 


Computer-Oriented Problems 


P9.9 


P9.10 


P9.11 


The MATLAB program “eigenfir.m” in the accompanying website can 
be used for designing complementary eigenfilters of the type discussed in 
Section 9.7.1. Use this program to design three filters with the following 
specifications: 


(i) N = 129, M = 4, a = 0.25, K = 16 
(ii) N = 129, M = 4, œ = 0.25, K =8 
Gii) N = 129, M = 4, æ = 0.25, K = 4 


For each design, confirm that the band edges are realized, as predicted in 
Section 9.7.2. 


Design a pair of analysis and synthesis prototype filters with the following param- 
eters: 


M=16, J=10, N, = N, =257, a,=a,=0.15, K,=3, K, =4. 


Put these into a subband structure and verify that the cascade of the analysis and 
synthesis filter banks is equivalent to a pure delay. For this, you may put a random 
sequence as input to the analysis filter bank and observe that the same sequence, 
with some delay, appears at the synthesis filter bank output. You may need to 
add an appropriate number of zeros at the beginning of the analysis or synthesis 
filters in order to get the right result from this experiment. Try your experiment 
for different values of the decimation factor L = 7, 8, 9, and 10. Do you observe 
any significant difference in the results? Explain your observation. Among these 
values of L, show that only L = 7 prevents aliasing of the subband signals. 


Use the analysis and synthesis filter banks of the last problem to realize an NLMS- 
based subband adaptive filter to model a plant with 500 fullband taps. Choose 
a set of independent random numbers with variance 0.01 as the samples of the 
plant impulse response. Also, add a Gaussian noise with variance 1074 to the plant 
output as the plant noise. Run your program for different values of the decimation 
factor, L, and verify that the subband adaptive filter converges toward an MSE that 
is much larger than the minimum MSE when the aliasing of the subband signals is 
significant. To convince yourself that this is due to some excessive misadjustment, 
as discussed in Section 9.6, you can reduce the step-size parameter ñ to some 
small value and let the NLMS algorithm to run over sufficient number of iterations 
and observe that it converges toward the expected minimum MSE. To confirm 
that subband adaptive filters are robust to the variation of the power spectral 
density of the input, try your experiment with white as well as colored inputs. 


10 


IIR Adaptive Filters 


In our study of adaptive filters in the previous chapters, we always limited ourselves to 
filters with finite-impulse response (FIR). The main feature of FIR filters, which has made 
them the most attractive structure in the application of adaptive filters, is that they are 
nonrecursive. That is, the filter output is computed based on only a finite number of input 
samples. This, as we noted in the previous chapters, results in a quadratic mean-squared 
error (MSE) performance surface, allowing us to use any of the simple gradient-based 
algorithms for finding the optimum coefficients (tap weights) of the filter. 

The use of recursive or infinite-impulse response (IIR) filters, on the other hand, have 
been less popular in the realization of adaptive filters for the following reasons: 


1. IIR filters can easily become unstable because their poles may get shifted out of the 
unit circle (i.e., |z| = 1, in the z-plane) by the adaptation process. 

2. The performance function (e.g., MSE as a function of filter coefficients) of an IIR 
filter, usually, has many local minima points. 


The problem of instability is usually dealt with by checking the filter coefficients after 
each adaptation step and limiting them to the range that results in a stable transfer function. 
This, in general, is a difficult job and adds additional complexity, which in many cases 
becomes significant when the filter order is large. This additional complexity tends to 
nullify the computational advantage provided by the recursive nature of these filters. 

Because of the multimodal nature of their performance surfaces, convergence of the 
IIR adaptive filters to their global minima is not guaranteed. The following approaches 
are usually used to deal with this problem: 


(i) Local minima are usually observed when the criterion used to adjust the filter 
coefficients is MSE. A modification to this criterion leads to quadratic performance 
surfaces similar to those of FIR filters, thereby eliminating the problem of local 
minima. This modification results in a special implementation of IIR adaptive filters 
known as the equation error method. The details of this method are discussed in 
Section 10.2. In contrast, the conventional formulation of IIR adaptive filters based 
on the Wiener filter theory, which may suffer from the problem of local minima, is 
referred to as the output error method. This is discussed in Section 10.1. 
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(ii) For specific applications, we may limit ourselves to IIR transfer functions whose 
associated MSE performance surfaces are unimodal, that is, they have no local min- 
ima. In such cases, the use of output error method is the preferred choice as its 
convergence to the associated Wiener filter is guaranteed. 


In this chapter, we discuss both the output error and equation error methods. As the 
performance of these methods are application dependent, we also present two case studies 
to highlight some of the implementation issues that one should consider while using IIR 
adaptive filters. The case studies that we have chosen are special applications, which 
demonstrate the efficiency of IIR adaptive filters when their structure and/or design criteria 
are wisely selected and, at the same time, some peculiar behaviors of such filters that are 
hard to predict, in general. 

The first application that we consider is adaptive line enhancement. The problem of 
adaptive line enhancement was discussed earlier in Chapter 1, where we reviewed various 
applications of adaptive filters, and also in Chapter 6, as an example of application of the 
LMS algorithm. We used a transversal filter to implement the line enhancer. However, the 
problem of line enhancement may also be viewed as one of realizing/achieving narrow- 
band adaptive filters. But, to realize a narrow-band filter in transversal form, we would 
need very long filter lengths. In the example given in Chapter 6 (Section 6.4.3), we 
used a 30-tap transversal filter to achieve satisfactory enhancement of a single sinusoidal 
signal. On the contrary, as we will see later, a second-order IIR adaptive filter with four 
coefficients is sufficient for this problem. In Section 10.3, we introduce and study a special 
form of transfer function, which has been found very appropriate for realization of IIR 
line enhancers. This is a good representative example of the second approach cited earlier, 
showing how a wise choice of the transfer function in a specific application can lead to 
a unimodal performance function, thereby solving the problem of local minima of the 
output error method. 

The second application of IIR adaptive filters that we discuss is equalization of magnetic 
recording channels. In the case of magnetic recording channels, realization of equalizers 
in digital form turns out to be very costly because of very high data rates (a few hundred 
megabits per second). To solve this problem, the general trend in the present industry is to 
use analog equalizers. We use the techniques presented in this chapter as tools to design 
analog equalizers for magnetic recording channels. This serves as a good representative 
example of the use of equation error method. 


10.1 Output Error Method 


Output error method results when the Wiener filter theory is made use of in a direct 
manner to develop algorithms for designing and/or adaptation of IIR filters. This can be 
best explained in the context of a system modeling problem as depicted in Figure 10.1. 
According to the Wiener theory, the coefficients of the recursive transfer function 
A(z) 
W(z) = ——__ 10.1 

a a (10.1) 
where A(z) and B(z) are polynomials in z, are obtained by minimizing the output 
error, e(n), in the mean-square sense. We thus need to find the global minimum of 
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plant noise, e,(n) 


x(n) | 


G(z) >D 
d(n) 
+ 
y(n) 


Figure 10.1 IR adaptive filter with output error adaptation method. 


the performance function £ = E[e*(n)] in an adaptive manner. However, we note that 
the performance function & is, in general, a multimodal function of the coefficients of the 
filter W (z), that is, € may have many local minima (see Chapter 3). This may lead to con- 
vergence of any gradient-based (such as LMS) algorithm to a suboptimal solution. In this 
section, we ignore this problem and simply develop an LMS algorithm for adaptation of 
the coefficients of W(z). As the coefficients are obtained by minimizing the output error 
(in some sense), this approach is named “output error method.” The use of this name, 
hence, is to emphasize on the special feature of the output error method as against the 
equation error method, which is based on a different criterion (see the following section). 

Next, we develop an LMS algorithm for adaptation of the coefficients of IIR filters. To 
facilitate this, we define the time-varying transfer functions 


N 
Alz, n) = Yo a; (nz (10.2) 
i=0 
and 
M 
B(z,n) = X b(n)z“ (10.3) 
i=l 


and note that the output, y(n), of the adaptive IIR filter is obtained according to the 
equation 


N M 
y= Y amaan- i) +Y biya- i) (10.4) 


i=0 i=1 
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The LMS algorithm, for this case, can now be derived by following similar line of 
derivations as those given in the previous chapters in the case of FIR filters. In particular, 
we recall that the LMS algorithm makes use of the stochastic gradient vector given by 


V(n) = Ve’ (n) = 2e(n)V,e(n) (10.5) 
where V,, is the gradient operator with respect to the filter tap-weight vector, w(7), and 
e(n) = d(n) — y(n) (10.6) 

is the output error. Here, the filter tap-weight vector w(n) is defined as 
w(n) =[dag(n) ay(n)...dy(n) by (n).. bym)! (10.7) 


Substituting Eq. (10.6) in Eq. (10.5) and noting that d(n) is independent of w(n), 
we obtain 


V(n) = —2e(n)Vyy(n) 


5 of 2 dy(n) ay) ay) — ay) | 
7 day(n) da;(n) ðaşy(n) Ab, (n) ABy (n) 


(10.8) 


The derivatives in Eq. (10.8) should be considered with special care, as y(n) depends on 
its previous values, y(n — 1), y(n — 2), ... 
From Eq. (10.4), we get 


dy(n) dy(n — 1) 


2 a n+ a TOR for i=0,1,...,N (10.9) 
and 
dy(n) dy(n — 1) , 
— b >, if SLM 10.10 
a D+ Donn ae a? i (10.10) 
To proceed, it is convenient to define 
ð 
ome n ee (10.11) 
ða;(n) 
and ay(n) 
y(n 
: , fori =1,2,...,M 10.12 
Bin) bajo ( ) 


Assuming that the coefficients a;(n)’s and b;(n)’s vary slowly in time, we get 


dya—l) _ dyn—I) 


a0) aa ON (10.13) 


and 
ðy(n — D dy(n — 


ab,(n) db, (n— 


= B(n—1) (10.14) 
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for /=1,2,..., M. Substituting Eqs. (10.13) and (10.14) in Eqs. (10.9) and (10.10), 
respectively, we get the following recursive equations for obtaining the successive samples 
of œ;(n)s and 6; (n)s: 


M 
a,(n) = x(n — i) + 5 bi(nja;(n — 1) (10.15) 
=l 


and i 
Bi(n) = y(n — i) +Y bompa- I) (10.16) 
{=l 


Using these results, the LMS recursion for adaptation of IIR filters may be summa- 
rized as 


w(n + 1) = w(n) + 2ue(n)y(n) (10.17) 


where 
n(n) = læn) on) ann) Bn) By (ayy (10.18) 


and a;(n)’s and £;(n)’`s are obtained recursively according to Eqs. (10.15) and (10.16) 
respectively. 


z1 z1 
| i! 1 an(n) | J 1 puin) 
1— B(z,n) 1— B(z,n) 


Figure 10.2 Implementation of IIR adaptive filter using the output error adaptation method. 
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Figure 10.2 depicts a block diagram showing the computations involved in calculating 
a;(n)’s and 6; (n)’s, according to Eqs. (10.15) and (10.16) respectively. From this diagram, 
we see that the computation of the elements of y(n) requires parallel implementation of 
M +N + 1 recursive filters with the same transfer function 1/(1 — B(z, n)), but different 
inputs, one for each element. 

The diagram of Figure 10.2 can be greatly simplified, if we assume that the transfer 
function 1/(1 — B(z,n)) varies only slowly with time. Then, we may use the following 
approximations: 


1 1 
1—Biz,n) 1—B(z,n—i)’ 


for i=1,2,...,max(N, M — 1) (10.19) 


where max(N, M — 1) denotes maximum of N and M — 1. This allows us to write from 
Eq. (10.15) 


M 
a; (n) ~ x(n — i) + Xo bn — ija; (n — Ì). (10.20) 
i=l 


On the other hand, substituting i by 0 and n by n — i in Eq. (10.15), we get 


M 
dig(n — i) = x(n —i) + ) b(n — agn — i — 1) (10.21) 
l=1 


Now, comparing Eqs. (10.21) and (10.20), we note that œ; (n) and a(n — i) are generated 
based on the same input, x(n — i), and approximately the same recursive equations. Thus, 
we get 

a(n) ~ag(n—i), for i=1,2,...,N (10.22) 


Similarly, we obtain 
Bin) ~ Bi(n—it+1), for i=2,3,...,M (10.23) 


Using these results, we obtain Figure 10.3 as an approximation to Figure 10.2. Note that 
in Figure 10.3, as opposed to Figure 10.2, we only need to use two filters with the transfer 
function 1/(1 — B(z,n)) to calculate a(n) and £; (n). The rest of values of a;(n)’s and 
B;(n)’s are simply delayed versions of ag(n) and 6, (n), respectively. Table 10.1 gives a 
summary of the LMS algorithm, which follows Figure 10.3. 


10.2 Equation Error Method 


As was mentioned before, the main problem with the direct minimization of output error 
of an IIR filter is that the associated performance surface may have many local minima 
points, thereby resulting in convergence of the LMS algorithm to one of these local 
minima, which may not be the desired global minimum. This problem may be resolved 
by using the method of equation error as explained in the following section. 

Figure 10.4 depicts a block diagram illustrating the principle behind the equation error 
method. Here, the error used to adapt the transfer functions A(z) and B(z) is 


e'(n) = d(n) — y'(n) (10.24) 
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Table 10.1 Summary of the output error LMS algorithm. 


Input: Tap-weight vector, 

w(n) = a(n) a, (n) -+ -ay (n) by) by T, 

Input vector, 

u(n) = [x(n) x(n — 1): -x(n — N) y(n — 1): y(n — MI, 

the previous samples of œọ(n) and £; (n), and Desired output, d (n). 
Output: Filter output, y(n), 

Tap-weight vector update, w(n + 1), 

and the samples of œọ(n) and £; (n) for next iteration. 


1. Filtering: 
y(n) = w'(n)u(n) 
2: Error estimation: 


e(n) = d(n) — y(n) 
3. n(n) update: 

a(n) = x(n) + ZÉ, bma — 1) 

By(n) = y(n) + DL y(n) By(n — 1) 

n(n) = [ag (2) a(n — 1)--- a(n — N) (n) -++ Bin — M)" 
4. Tap-weight vector adaptation: 

w(n + 1) = w(n) + 2e(n) n(n) 


> a1(n) = ao(n — 1) & b(n) = Bi(n — 1) 


an(n) = ap(n— N) Bu(n) = b(n- M +1) 


Figure 10.3 Simplified implementation of IIR adaptive filter using the output error adaptation 
method. 
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plant noise, e,(n) 


s 


y(n) a 
Figure 10.4 IIR adaptive filter using the equation error method. 
where 
N M 
y(n) = Xa, (n)x(n— i) + D b;(n)d(n — i) (10.25) 
i=0 i=l 


This equation may be thought of as a modified version of Eq. (10.4). It is obtained by 
replacing the past samples of output, y(n — 1), y(n — 2), ..., in Eq. (10.4), by the past 
samples of the desired output, d(n — 1), d(n — 2), .... The name “equation error” refers 
to this difference in the equation used to calculate the error e’(n), as against the exact 
value of the output error, e(n). 

The adoption of the equation error method is based on the following rationale. When 
the structure and order of an adaptive filter are correctly selected, one would expect 
d(n) © y(n) upon adaptation of the filter. In that case, the difference between the error 
sequences e(n) and e’(n), when both have converged toward their optimum values, is 
expected to be small. Hence, we may expect the performance surfaces associated with the 
output error and equation error methods to have approximately the same global minimum 
points. The use of equation error is then preferred, as its associated performance surface 
will be unimodal, that is, does not have any local minimum. This unimodality results 
from the fact that the output of the filter y’(m) in Eq. (10.25) is no more recursive in 
nature, that is, y’(n) is effectively the output of a linear combiner with the tap-weight 
vector w(n), as defined by Eq. (10.7), and the tap-input vector 


u'(n) = [x(n) x(n — 1) -- -x(n — N) d(n — 1)---d(n— M)]". (10.26) 


Now, we present a study of the equation error method, which reveals its relationship 
as well as its difference with the output error method in a greater detail. For this study, 
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we note that in the equation error method the criterion used for adaptation of the transfer 
functions A(z) and B(z) is 
Ẹ' = Ele’ (n)] (10.27) 


It is straightforward to show that this is a quadratic function of the coefficients of A(z) 
and B(z). This readily follows from the fact that the output y(n), as mentioned earlier, 
is the output of a linear combiner derived by the input sequences x(n) and d(n). Hence, 
convergence of the LMS or any other gradient-based algorithm, which may be used to 
find the optimum tap weights of A(z) and B(z), is guaranteed. 

We obtain a better understanding of the equation error method by finding the relationship 
between the output and equation errors, e(n) and e’(n), respectively. This relationship can 
be easily arrived at using the z-transform approach. From Figure 10.1, we note that 


D(z) = X(z)G(z) + E,(z) (10.28) 
7 XAR) 
z)A(z 

Y(z) = TeRi (10.29) 


where D(z), X (z), E,(z), and Y (z) are the z transforms of the sequences d (n), x(n), e(n), 
and y(n), respectively. Then, because e(n) = d(n) — y(n), we obtain from Eqs. (10.28) 
and (10.29) 


E(z) = D(z) — Y (z) 
A(z) 


= X (z) (co = 1— Be) 


) + E,(z) (10.30) 
where F(z) is the z transform of the sequence e(n). On the other hand, from Figure 10.4, 
we note that e’(n) = d(n) — y'(n), with y’(n) as given in Eq. (10.25). Hence, we get 


E'(z) = D(z) — D(z) B(z) — X(z) A(z) (10.31) 


where E’(z) is the z transform of the sequence e’(n). Substituting Eq. (10.28) in Eq. 
(10.31) and rearranging, we get 


E'(z) = XDI — B(z))G(z) — A(z] + [1 — BIE, (2) 
A(z) 


= [xo (co = 1- Bo 


) + Eo [1 — B(z)] (10.32) 


Finally, comparing Eqs. (10.30) and (10.32), we obtain 
E'(z) = E(z)(1 — B(2)) (10.33) 


This result shows that the equation error, e’(n), is related to the output error, e(n), through 
the transfer function | — B(z). In general, minimization of the mean-squared values of 
the output and equation errors, e(n) and e’(n), respectively, could lead to two different 
sets of tap weights for the IIR filter. However, the two solutions may be very close for 
certain cases. For instance, when &’ = E[e’ ?(n)] converges to a very small value and 
1 — B(z) is not very small for all values of z on the unit circle, € = E[e?(n)] would also 


IIR Adaptive Filters 331 


be very small; thus, we expect both the output and equation error methods to converge to 
about the same solutions. On the other hand, when the minimum value of &’ is large or 
1 — B(z) is very small over a range of frequencies, the two solutions may be significantly 
different. The following example clarifies this concept further. 


Example 10.1 
Consider Figures 10.1 and 10.4. Let the plant G(z) be given by 


1 
G(z) = ——— 
@) 1 —0.5z7! 
and choose the modeling filter as 
a 
W(z) = ———_ 
@) 1 — biz! 


Clearly, when the plant noise e,(m) is uncorrelated with the input, x(n), the minimum 
MSE (Wiener) solution to this problem, that is, what we obtain by using the output error 
method (assuming that the global minimum of the corresponding mean-squared error 
function can be found), is ad), = 1 and bi o = 0.5. Here, the subscript “o” emphasizes 
that the coefficients are those of the optimum Wiener filter. The minimum MSE in this 
case is 


2 
Einin = % 


where of = E[e2(n)]. 

To find the optimum values of aọ and b, in the case of the equation error method, 
we note that the filter tap-input and tap-weight vectors are, respectively, w(n) = 
[x(n) d(n — 1)]" and w = [ap b,]" and the desired signal is d(n). The optimum value of 
w is then obtained by solving the normal equation 


Rw=p (10.34) 
where , 
R = E[u (nu (n)] = | Ti F Ta 
and 


_ | Eld(n)x(n)] 
P =| Ejd(ndn — 1)] 


To facilitate evaluation of R and p and the subsequent calculations in this example, 
we assume that the input, x(n), is white and has variance unity. This implies that the 
power spectral density of x(n) is equal to | for all frequencies, that is, ® (z) = 1. 
Also, E[x?(n)] = 1. The rest of the elements of the correlation matrix R and the 
cross-correlation vector p are obtained by using the results of Chapter 2. For example, 


1 d 
Eld(n)x()] = ba) = 3 f OJOS 


O T 1 dz 
~ Oni J 1—0.5z7! z 
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T) dz 
~ Inj J z-0.5 


5 $ 


= residue of 
z 
= 1 
Similarly, we also obtain 
4 
E[d*(n)] = A oo, Elx(n)d(n—1)] = Eld(n — 1)x(n)] = 0 
and 5 
E[d(n)d(n — 1)] = 3 


Substituting these results in Eq. (10.34) and solving for w, we obtain 


1 1 


a) e = 1 and bie = 7 . 14 302/4 


where the subscript “e” signifies that the solutions correspond to the equation error method. 
We note that, in this particular case, ap, is unbiased, that is, it is equal to its optimum 
value. However, b, e is different from its optimum value in the Wiener filter. The amount 
of bias in b, e is 

1 302/4 

0 2 1+302/4 


This bias is negligible when o2 is small. However, it becomes significant as o2 increases. 


Further study of this example shows that when x(n) is colored (nonwhite), both age 
and b; e are biased and the amount of bias, as we expect, increases with oè. This is left 
as an exercise for the reader (see Problem P10.2). 


10.3 Case Study I: IIR Adaptive Line Enhancement 


As was noted earlier in this chapter, adaptive line enhancement is a special problem that 
can be best solved by using IIR filters. In this section, we consider a special second-order 
IIR transfer function that was first proposed by David et al. (1983) and subsequently used 
and developed further by the same authors and others Ahmed et al. (1984); Cupo and Gitlin 
(1989); Hush et al. (1986); Regalia (1991); Cho and Lee (1993), and Farhang-Boroujeny 
and Wang (1997). 

Figure 10.5 depicts the block diagram of the adaptive line enhancer (ALE) that we 
wish to study in this section. Here, W(z) is an IIR filter with the transfer function 


d- s)(w- z!) 


10.35 
1- (0 Fsuz sz fone 


W@= 


This is a narrow-band filter that may be used to extract a portion of the spectrum of the 
input, x(n). When x(n) is the sum of a narrow-band and a wide-band processes and W (z) 
is centered around the narrow-band part of x(n), the output of W (z) will contain mainly 
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Figure 10.5 Adaptive line enhancer. 


the narrow-band part of x(n). The term line enhancer, thus, refers to the fact that the 
narrow-band part of x(n), which may be considered as a spectral line, is enhanced in the 
sense that it is separated from the wide-band part of x(n), which may be thought of as 
noise. In the following, we look into the details of the IIR ALE. 


10.3.1 HR ALE Filter, W(z) 


As was noted earlier, the transfer function W(z) of Eq. (10.35) is that of a narrow-band 
filter. Its bandwidth is controlled by the parameter s, which may select any value in the 
range from 0 to 1. Filters with really narrow bandwidth can be realized by choosing 
values of s very close to 1. The parameter w is related to the center frequency, 0, of the 
passband of W(z) according to the following equation: 


w = cos (10.36) 


Substituting Eq. (10.36) in Eq. (10.35) and evaluating W (z) at z = e/”, that is, frequency 
response of W (z) at the center of its passband, we obtain 


W (etf) = e}? (10.37) 
This shows that at z = e/ cos”! w. W(z) =z or, equivalently, 
at frequency w = cos !w, z`!W(z)= 1 (10.38) 


Noting that z~!W(z) is the transfer function between the input, x(n), and the output, 
y(n), of the line enhancer, the above result implies that the gain of the line enhancer to a 
sinusoid at frequency w = cos! w is exactly equal to 1. This interesting property of the 
IIR line enhancer of Eq. (10.35) becomes advantageous in applications of notch filtering 
and also when multiple stages of line enhancers are cascaded together to enhance multiple 
sinusoids (spectral lines). Application of the line enhancer structure of Figure 10.5 as a 
notch filter is obvious if we note that the transfer function between the input, x(n), and the 


error, e(n), is 1 — z~!W(z) and according to Eq. (10.38), this has a null at œ = cos™! w. 
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10.3.2 Performance Functions 


To simplify our discussion, we assume that the input signal to the ALE is 
x(n) = asin(@,n) + v(n) (10.39) 


where a and @, are constants and v(n) is a zero-mean white noise process with variance 
a, We refer to the first term in Eq. (10.39) as (desired) signal and v(m) as noise. When 
x(n) is given by Eq. (10.39), the performance function &,,(s, w) = E [e?(n)] of the IIR 
ALE, is given by the following equation (see Problem P10.3): 


2 2 

E,(s, w) = Sil — e} W (ei%) |? + = (10.40) 
The subscript w in &,,,(s, w) signifies the fact that, as we see shortly, this is the performance 
function that is used to adjust w. In contrast, we define another performance function, 
é (s, w), later, which is used for adapting the parameter s. Figure 10.6 shows a set of plots 
of &,,(s, w) as a function of w when s is given different values. These plots correspond 
to the case where 6, = 1/3, a = V2, and o? = 1. Observe from these plots that the 
performance function &,,(s, w) is a unimodal function of w. Its minimum corresponds to 
w =cos6,, irrespective of the value of s. This can be easily proved analytically and is 
left as an exercise for the reader. This observation suggests that if s is kept fixed, the 
optimum value of w can be obtained by using a gradient search method, such as the LMS 


-1 —0.5 0 0.5 1 


Figure 10.6 Plots of the performance function &,,(s, w) for different values of s. 
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algorithm. Furthermore, note also from Figure 10.6 that if w is set to its optimum value, 
the minimum value of &,,(s, w) reduces as s approaches |. This clearly improves the 
performance of the ALE. On the other hand, when w is not close to its optimum value, 
increasing the value of s results in slowing down the convergence of w as the gradient 
of &,,(s, w) is quite small when w is away from its optimum value and s is close to 1. 
To solve this problem, the parameter s may initially be given a smaller value and after 
or close to the convergence of w, it is changed to a larger value (Cho and Lee, 1993). To 
automate this, we need to find another performance function that allows us to quantify 
or detect the closeness of w to its optimum value. A possible performance function that 
may be used for this purpose is! 


é,(s, w) = Elye(n)] (10.41) 


ys(n) =f ya) (10.42) 


The adaptation of the parameter s is done by maximizing &,(s, w) with respect to s. It is 
straightforward to show that 


where 


l+s 
l-s 


E.(s,w) = > (Wei)? + 02. (10.43) 

Figure 10.7 shows the plots of &,(s, w), as a function of w, for 0, = 7/3, a= «/ 2, 
o? = 1, and s = 0.5, 0.7, and 0.8. These plots clearly show that the performance function 
&,(s, w) is a proper choice for adjusting s. It perfectly satisfies the requirements stated 
earlier for changing s, namely, the maximization of &,(s, w) reduces s when w is far from 
its optimum value, and increases s as w approaches its optimum value. This can also be 
shown by observing the sign of 0&,(s, w)/ds as w varies. 


10.3.3 Simultaneous Adaptation of s and w 


Following similar derivations to those given in Section 10.1, we obtain the algorithm 
presented in Table 10.2, for simultaneous adaptation of s and w. We refer to this as 
Algorithm 1, for our reference later. As in Table 10.2, henceforth we will use the notation 
s(n) and w(n) for s and w, respectively, as they vary with time because of adaptation. 
The derivations of the first four steps in Table 10.2 are straightforward. To derive the 
last two steps of the algorithm, we note that s(n) is updated according to the recursive 
equation 

dy, (n) 


aes (10.44) 


s(n +1) = s(n) + u, ` 


as our goal is to select s(n) so that £, (s, w) = E [y2(n)] is maximized. Note also from 
Eq. (10.42) that 
1+ s(n) 
yn) = ——— - y*(n) 
1 — s(n) 


1 The performance function & (s, w) was first proposed by Farhang-Boroujeny and Wang (1997). 
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Figure 10.7 Plots of the performance function &,(s, w) for different values of s. 


Table 10.2 Summary of adaptive IIR ALE (Algorithm 1). 


y(n) = (1 + s(n))w(n)y(n — 1) — s(n)y(n — 2) 
+ (1 = s(n))(w(n)x(n) — x(n — 1)) 
e(n) = x(n + 1) — y(n) 
a(n) = (1 + s(n))w(n)a(n — 1) — s (n)a (n — 2) 
+A +s(n))y = 1) + Ud — s(n))x (n) 
w(n + 1) = w(n) + 2u„e(n)a(n) 
Bn) = (1+ s(n))w(n)B(n — 1) — s(n) B(n — 2) 
—(w(n)e(n — 1) — e(n — 2)) 


l i 1 2 1+ s(n) 

s(n +1) = s(n) +24, Reno (n) 4 2 sengo] 
y(n) _ ay(n) 

Definitions: a(n) = dwn)’ B(n) = Is) 


Uy, and u, are step-size parameters. 
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10.3.4 Robust Adaptation of w 


Consider Figure 10.8a, where two plots of &,,(s, w) are given corresponding to 6, = 7/3 
and 7/15, with s fixed at 0.75. These plots show that the shape of the performance 
function &,,(s, w) is sensitive to the value of 6,. In particular, we note that when 0, is 
close to 0, the function &,,(s, w) is nearly flat over most of the values of w, except when 
w is very close to its optimum value. This would result in extremely slow convergence 
for any gradient-based algorithm, unless w is initialized close to its optimum value. The 
same sensitivity is observed when 0, is close to 7. 

This problem may be solved if we let w = cos 0 in Eq. (10.35) and adapt @ instead of 
w. With this amendment, the plots of the performance function £, (s, cos 0), as a function 
of 6, are as shown in Figure 10.8b. We note that there is not much difference between 
the two plots in Figure 10.8b, as opposed to the pair in Figure 10.8a. 

A robust implementation of the IIR ALE, which has reduced sensitivity to variations 
of 6, may thus be proposed by considering the change of variable w =cos@ in Eq. 
(10.35) and adaptive adjustment of @ instead of w. Table 10.3 gives a summary of the 
resulting algorithm and is called Algorithm 2 for our reference later. This algorithm, 
although more complicated than Algorithm 1 (because of the involvement of the sine and 
cosine functions), has been found to be much more robust when 6, is close to 0 or 7 
(Farhang-Boroujeny and Wang, 1997). 


10.3.5 Simulation Results 


In this section, we study the performance of the algorithms given in Tables 10.2 and 10.3 
using computer simulations. We also discuss a cascade implementation of the IIR ALE, 
which may be used for enhancement of multiple sinusoidal signals. 

Figure 10.9 presents a set of plots that show convergence as well as tracking behavior 
of Algorithms 1 and 2, when the ALE input is a single sinusoid in additive white Gaussian 
noise, as in Eq. (10.39). The simulated scenario consists of o? = 0.5 and a unit amplitude 
sinusoid with its angular frequency, 0,(”), varying as shown in the figure. This corresponds 
to a signal-to-noise ratio (SNR) of 0 dB. The step-sizes are selected (empirically) according 
to the following equations: 


u, = 0.0005, u(n) = 0.025(1 — s(n))*, and u(n) = 0.05(1 — s(n))3 


Note that the step-size parameters u„(n) and u(n) are chosen to be time varying and 
are selected according to the present value of s(n). This results in large step-sizes when 
s(n) is small and small step-sizes as s(n) approaches 1. The rationale behind this choice 
is the following. When w(n) and 6(n) are far from their optimum values, s(n) becomes 
small and hence it is better to use larger step-sizes to ensure faster convergence of the 
algorithm. On the other hand, when w(n) and O(n) are close to their optimum values, 
smaller step-sizes should be used to reduce the misadjustment of the algorithms. The 
above choices of u„(n) and u(n) also compensate for the change of slope of the per- 
formance function é,,(s, w) as s(n) selects different values (see Figure 10.6). Thus, the 
equations proposed for adjusting u„(n) and jz, () are based on these intuitions as well as 
a wide range of simulation tests. The parameter s(n) is initialized to 0.25 in the beginning 
of each simulation and is allowed to vary in the range from 0.25 to 0.9. For Algorithm 1, 
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& (s,cos 6 ) 


Figure 10.8 (a) Plots showing the variation of the performance function €,,(s, w) as 0, approaches 
0 and (b) plots showing reduced sensitivity of the performance function &,,(s, cos@) to variations 


in 0. 
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Table 10.3 Summary of adaptive IIR ALE: Algorithm 2. 


w(n) = cos @(n) 
w’(n) = sin O(n) 
y(n) = (1 + s(n))w(n)y@ — 1) — s(n)y(n — 2) 
+A — s(n))(w(n)x (n) — x(n — 1)) 
e(n) = x(n + 1) — y(n) 
a(n) = (1+ s(n))w(n)a(n — 1) — s (n)a (n — 2) 
=w (MA + s(n))y(n — 1) + A = s(n))x(n)] 
O(n + 1) = O(n) + 2uge(n)a(n) 
pn) = (1+ s(n))w() BQ — 1) — s(n) B(n — 2) 
—(w(n)e(n — 1) — e(n — 2)) 


1 
s(n +1) = s(n) 4 an| 2(n) 4 == songo] 


1 
A sG) 


ay(n) ~ ay(n) 
am EO = ascn) 


Uo and u, are step-size parameters. 


Definitions: a(n) = 


the parameter w(n) is initialized to 0 (= cos~!(z/2)) and is confined to the range from 
—0.999 to 0.999. Similarly, for Algorithm 2, 0 (n) is initialized to 2/2 and is confined 
to the range cos~!(—0.999) to cos~!(0.999). The results clearly show the superior per- 
formance of Algorithm 2. In particular, observe that as 6,(”) approaches 0, its estimate, 
@(n), becomes more noisy, when Algorithm 1 is used. On the contrary, Algorithm 2 is 
much more robust. 

To enhance or extract multiple sinusoids, we may use a cascade of a few IIR ALEs as 
in Figure 10.10. This configuration corresponds to an L stage line enhancer, where each 
stage is responsible for enhancement of one single sinusoid. The output error from each 
stage is the input to the next stage. The enhanced narrow-band outputs, y,(n)’s, from the 
successive stages are added together to obtain the final output, y(n), of the line enhancer. 
The adaptation of multistage line enhancer begins with its first stage. The adaptation of the 
following stages begins once the previous stages have converged. Experiments have shown 
that this method works well (Cho and Lee, 1993). To decide on activating/deactivating 
the successive stages of the multistage IIR ALE we may use the parameters s(n)’s of 
the previous stages. We know that for each stage s(n) increases and approaches | only 
when w(n) (or O0()) is near its optimum value. Thus, by comparing s(n) of each stage 
with a threshold level, we may decide on activating or deactivating the adaptation of the 
following stage(s). This provides a very simple and effective mechanism for controlling 
the adaptation of the cascaded IIR ALE. 
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Figure 10.9 Simulation results illustrating the convergence as well as tracking behavior of the 
IIR ALE: (a) Algorithm 1 and (b) Algorithm 2. 
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Figure 10.10 Cascaded IIR ALE for enhancement of multiple sinusoids buried in white noise. 


Figure 10.11 illustrates the performance of the cascaded IIR ALE when Algorithm 2 is 
used. The input signal consists of the sum of four sinusoids in additive white Gaussian 
noise, and is given by 


x(n) = sin(@,n + ġ1) + 2sin(@)n + p2) 
+ 0.25 sin(w3n + 3) + 0.5 sin(wyn + 4) + v(n) 


where @), œ, œz, and œ; are equal to 7/1.8, x/3.5, 2/6, and 2/12, respectively, ø; to 4 
are random phases that are selected in the beginning of each simulation trial and remain 
fixed during that trial, and o? = 0.25. This value corresponds to SNRs of 3, 9, —9, and 
—3 dB, respectively, for the individual sinusoids. As the energy of the input signals to 
successive stages of the ALE are different, the step-sizes of each stage are normalized 
to the energy of the input signal to that stage. The equations used for this purpose are 
Upi = 0.0005/6? and pp ;(n) = 0.010. — s(n))°/62. In these equations, i refers to the 
stage number, and re is an estimate of the energy of the input signal, x;(7), to the ith 
stage. The following recursive equation is used for estimation of a, 


62 (n) = 0.9867 (n — 1) + 0.02xẹ? (n). 


The parameters s;(1)’s are allowed to change between 0.25 and 0.9, and the threshold 
level used for activating or deactivating the adaptation of the following stages is set at 
0.85. The results in Figure 10.11 show that this mechanism works very well. It may also 
be noted that the first stage is tuned to the strongest sinusoid (w,), and the last stage 
is tuned to the weakest one (w3). Such observation is intuitively sound. The MATLAB 
programs used to generate the results of this section are available on an accompanying 
website. The reader is encouraged to examine these programs and run further simulations 
to learn more about the line enhancer as well as the difficulties that one may encounter 
in using IIR adaptive filters. It would be also interesting to compare the behavior of FIR 
and IIR line enhancers. This is left as an exercise for interested readers. 
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(b) 
Figure 10.11 Simulation results showing convergence of the cascaded IIR ALE when used to 


detect/enhance multiple sinusoids: (a) angular frequencies and (b) s(n) parameters. Algorithm 2 is 
used for the adaptation. 
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10.4 Case Study I: Equalizer Design for Magnetic 
Recording Channels 


Figure 10.12 depicts the block diagram of the magnetic recording channel that we wish 
to address in this section. This channel is characterized by its continuous time impulse 
response, h(t). The subscript a is to emphasize that h,(¢) is an analog quantity, that is, 
it is a continuous function in amplitude as well as time, t. As was noted in Chapter 1 
(Section 1.6.2), h,(t) is also called the dibit response and is usually modeled as the 
superposition of positive and negative Lorentzian pulses, separated by one-bit interval, 
T. That is, 


h,(t) = g,(t) — g,(t — T) (10.45) 
where g,(t) is the Lorentzian pulse defined as 


1 


T+ t/t)? en 


g(t) = 


and it is the response of the channel to a step input. The parameter tọ, which is the 
pulse-width of g,(t) measured at 50% of its maximum amplitude, is an indicator of the 
recording density. The recording density, D, is specified by the ratio f5)/T. Clearly, higher 
density implies denser storage and vice versa. 

The response of the channel to the data bits,” s(n), is then 


x,(t) = $ s(n)hg(t — kT) + valt) (10.47) 


n 


where v,(t) is the channel noise. The detector assumes that its input is the convolution 
of the data bits, s(n), with a known response, called target response, and it uses this 
information in doing the detection. Hence, our aim is to design an analog equalizer 
(filter) whose impulse response, w,(t), when convolved with the dibit response, h, (t), 


X s(n)d(t — nT) 
A 


eN ha(t) zalt) walt) T7 detector a 


T(z) 


Figure 10.12 Model of a magnetic recording channel. 


2 The data bits s(n) are assumed to take values +1 and —1. 
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matches the desired target response as close as possible. In particular, we are interested 
in matching the combined response of the channel and equalizer, that is, 


n (t) = / w,(T)h (t — t)dt (10.48) 


with the target response at sampling instants separated by the bit interval, T. As was 
noted in Chapter 1 (Section 1.6.2), the target response in magnetic channels is usually 
one of the class-IV partial responses characterized by the transfer functions 


Pz) =z “042 Yd=< (10.49) 


where z~! represents one-bit delay, A is a parameter that takes care of the delays intro- 
duced by the channel and equalizer, and K is an integer greater than or equal to 1. The 
choice of K depends on the recording density, D. The value of K also determines the 
complexity of the detector. The commonly used values of K are 1, 2, and 3. 

Next, we go through a sequence of discussions that lead us to a design methodology, 
using the results of this chapter as well as the previous chapters, for designing analog 
equalizers in the application of magnetic recording channels. 


10.4.1 Channel Discretization 


As all of the derivations in this book are based on sampled signals, we would like to 
replace the continuous time channel and equalizer impulse responses, h, (t) and w,(t), 
respectively, by their associated discrete-time counterparts. Define the sequences 


h,; =h,(iT,) and w = w,(iT,) 
where T, is the sampling period. When T, is sufficiently small, we obtain, from Eq. (10.48) 
ni = MCT) © T, + (h; x w;) (10.50) 


where * denotes convolution. The identity (10.50) follows from Eq. (10.48) by setting 
t =iT, and approximating the integration on the right-hand side of Eq. (10.48) by a 
summation. 

We note that the accuracy of the approximation used in Eq. (10.50) depends on the 
value of T,. In practice, T, has to be selected a few times smaller than the bit interval, 
T, for the results to be reasonably accurate. In the design procedure that we develop 
here we select T, so that T = LT,, where L is an integer greater than 1. We call L the 
oversampling factor. Reasonable values of L are in the range of 4 to 10. 

In the rest of our discussion, we assume that the time scale is normalized so that T, = 1. 
We also ignore the nonexactness of Eq. (10.50) and thus obtain 


n; = h; x wi (10.51) 


as the discrete-time combined response of the channel and equalizer. 
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10.4.2 Design Steps 


The following steps are taken in designing analog equalizers for magnetic recording 
channels:* 


1. Using the sampled channel response, h;, and the statistics of the channel noise (e.g., 
autocorrelation of the noise at the channel output), a fractionally tap-spaced FIR equal- 
izer* is designed. The criterion that we use in this design is the mean-squared error 
between the signal samples at the equalizer output and the desired signal that is 
obtained by passing the data sequence, s(n), through the target response F(z), as in 
Figure 10.12. 

2. A discrete-time IIR filter whose impulse response matches best with the designed FIR 
equalizer is found. The equation error method will be used to find this match. 

3. The discrete-time IIR filter obtained in Step 2 is then converted into an equivalent 
analog filter, as the desired analog equalizer. 


Next, we proceed with the details of the above steps. 


10.4.3 FIR Equalizer Design 


We define the equalizer tap-input and tap-weight vectors as 
x(n) =[x(n) x(n—1)---x(n-—-N+ 1)" (10.52) 


and 
w = [wp wi- wyi] (10.53) 


respectively. The equalizer output is then 
y(n) = w' x(n) (10.54) 


We note that the samples of the equalizer input, x(n), and output, y(n), are at T, 
intervals. However, in the optimization of the equalizer tap weights, w;s, we are only 
interested in samples of y(n) at T = LT, intervals. Hence, we define the error as 


e(n) = d(n) — y(nL) (10.55) 
and, accordingly, the performance function 
§ = Ele*(n)] (10.56) 


where 


d(n) =)" y;s(n — i) (10.57) 


3 The design procedure discussed here follows Mathew, Farhang-Boroujeny, and Wood (1997). 
4 The term fractionally tap-spaced equalizer refers to the fact that the spacing between the successive taps of the 
equalizer, Wj, is T which, as noted earlier, is a few times smaller than the bit interval, T. 
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and y;s are the samples of the target response that are obtained by taking the inverse 
z-transform of T(z) (see Figure 10.12). As an example, when K = 2, 


T@=z 4042 Pd —-27) 


= z^ p zt) _ 2-(A4+2) _ ,-(A43) 
and this gives 
1 fori = Aand A + 1 
y; = 4-1, fori = A+2 for A +3 


0, otherwise 


We also note that the dibit response, h (t), is noncausal. To come up with a realizable 
equalizer, we need to shift h (t) to the right by a sufficient length, ¢,, such that the 
remaining noncausal part of the shifted dibit could be ignored. This is done by replacing 
h (t) with h (t — t.) in the earlier results and assuming that h, (t — to) = 0, for t < 0. We 
also redefine the sampled dibit response, h;, as 


h; = h (iT, — t). 


Furthermore, we note that the samples h; for certain large values of i are small and thus 
these may also be ignored. Hence, for our further derivations, we define the dibit vector 


h=[hg hihu] 


where M is a sufficiently large integer such that the values of h; fori > M are negligible. 
Also, for convenience of the derivations that follow, we assume that M is an integer 
multiple of the oversampling factor, L. 

We now turn back to the performance function £, which was introduced earlier, in Eq. 
(10.56). Using Eq. (10.54) and Eq. (10.55) in Eq. (10.56), and solving the corresponding 
Wiener—Hopf equation, which follow from V,,é = 0, we get the optimum tap-weight 
vector of the desired FIR equalizer as 


Wop = Rp (10.58) 


where R = E[x(nL)x' (nL)] and p = E[d(n)x(nL)]. Here, the definition of the column 
vector x(nL) follows Eq. (10.52). 

To obtain an explicit expression for w,,,, we note that the elements h; are at T, intervals, 
but the data bits, s(n), and the target response, y;, are at T = LT, intervals. Noting this 
and assuming that N is an integer multiple of L, we obtain 


x(nL) = Hs(n) + v(nL) (10.59) 


where 
s(n) =[s(n) s(n—1)---s(n— N/L + 1)]7 (10.60) 
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ho hy họ hz ts hy 0 
O ht- ħar- Aap. +++ hm-r-1 Ami 
O hr- ħar Mgp-o +++ ë hm-r-2 hm- 
0 ho hy hr >> hu- ħu > 
H=|ọ oọ hi- ħar +++ humr- m-i- 0 (10.61) 
0 0 č hia Maz. +++ hmr Ay-t-2 °°: 
0 0 ho hy ©- hust hma 


and v(nL) is the associated vector of samples of the channel noise, v, (t). We may also 
write Eq. (10.57) as 
d(n) = y's(n) (10.62) 


where y is the column vector consisting of the samples of the target response, y;’s. The 
length of y is appropriately selected by appending extra zeros at its end so that it would 
be compatible with s(n). 

Using the above results and assuming that the binary process s(n) and the noise process, 
v(n), are white and independent of one another, we obtain 


R = HH" +071 (10.63) 


where o? is the variance of v(n) and I is the identity matrix. In arriving at Eq. (10.63), 
we have also used the fact that E[s(n)sT(n)] = I because s(n) is white with values +1. 
Similarly, we also obtain 

p = Hy (10.64) 


Substituting Eqs. (10.63) and (10.64) in Eq. (10.58), we obtain the following explicit 
equation for the desired optimum fractionally tap-spaced FIR equalizer: 


Wopt = (HH? + of D 'Hy (10.65) 


10.4.4 Conversion from FIR into IIR Equalizer 


As the next step in designing analog equalizers, we need to find an IIR filter whose 
response closely matches the designed FIR equalizer. For this, we use the method of 
equation error that was discussed in Section 10.2. With reference to Figure 10.4, in the 
present context of magnetic recording, x(n) is the channel output, G(z) is the designed 
FIR equalizer, e,(n) = 0, for all n, A(z) and B(z) are polynomials, which define the 
transfer function W (z) of the desired IIR filter according to Eq. (10.1), and the unit delay 
z7! is equivalent to one T, interval. We can use the LMS or any other adaptive filtering 
algorithm to find the coefficients of A(z) and B(z). We may also adopt an analytical 
method and develop a closed-form solution for the coefficients of A(z) and B(z), or, use 
time averages to estimate the coefficients of the related Wiener—Hopf equation. We use 
the last method in the numerical examples discussed below, because of convenience. 
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10.4.5 Conversion from z Domain into s Domain 


Among the different methods available for conversion between s-domain and z-domain 
transfer functions, we discuss the method of impulse invariance. To give a brief intro- 
duction to this method, we consider a causal continuous-time system with the transfer 
function 


HG) = > - = (10.66) 


with s;’s being the poles of the system. The impulse response of this system is 
q. Sit fi t> 
h,(t) = dite: fort z0 (10.67) 
0, otherwise 


Now, if we consider a discrete-time system whose unit-sample (impulse) response is given 
by the samples ,(0), h, (T), ha(2T,), ..., its transfer function will be 


H@)= >| h kT) 


k=—00 


o0 
= X X a, ets 2-k 
k=0 


i 


oo 
= ù X ajei*Ts z= 
k=0 


i 


adi 
= = (10.68) 
: Z 
The reverse of this conversion is obvious. That is, if 
_ a 


is the transfer function of a discrete-time system with unit-sample response h,,, the transfer 
function of the continuous-time system whose impulse response samples, at T, intervals, 
is the sequence h,,, is 
a: 
H,(s) = — 10.70 
a(s) 2 Z ( ) 


i 


10.4.6 Numerical Results 


To highlight some of the features of the design method that was developed earlier, we 
present some numerical results using the Lorentzian pulse (see Eqs. (10.45) and (10.46)) 
as the model for the magnetic recording channel. The measure used for evaluating the 
designed equalizer is the SNR at the detector input. It is defined as 


X; ye 


detection SNR = 
Yin yY + 020? 


(10.71) 
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The value of the variance of the channel noise, a, is selected based on another SNR that 
is defined at the equalizer input (or channel output) as 


2 


hi 
SNR at the equalizer input = Zl : (10.72) 
o 


v 


It may be noted from Eq. (10.71) that the noise at the detector input is considered to be 
the sum of channel noise at the equalizer output and residual intersymbol interference 
(ISI). Further exploration of this definition is left as an exercise for interested readers. 

We present results of many designs that are obtained for various choices of channel 
parameters. Evaluating these results, we find that, among different parameters, perfor- 
mance of the IIR equalizer is highly affected by the choice of the delays f, and A. 
Furthermore, the choice of t, is closely related to the value of A. In a good design, 
usually, tj = AT + t, where T (as defined before) is the bit interval and Tt is a relatively 
small delay in the range from 0 to 37. The best value of t, which results in maximum 
detection SNR, depends on the number of zeros and poles of the IIR equalizer, noise 
level in the channel, and recording density, D. The effect of these on the optimum value 
of t is difficult to predict. It appears that the only way of finding the optimum T is to 
design many IIR equalizers for different values of t and choose the best among them. In 
Figure 10.13, we show how the detection SNR varies as a function of t/T, for certain 
selected choices of recording density, D, SNR at the equalizer input, and number of poles 
and zeros of the IIR equalizer. The value of K is set at 2 in this set of results. These 
plots clearly indicate that the choice of t is very critical in the final performance of the 
IIR equalizer. 

Figure 10.14 shows an example of the equalized dibit response of the magnetic record- 
ing channel. This is obtained by passing the dibit response h; through the designed IIR 
equalizer. The parameters used to obtain these results are: D = 2.5, t/T = 0.3, SNR at 
the equalizer input = 30 dB, IIR equalizer with 5 zeros and 6 poles, and K = 2. Observe 
that an almost perfect match between the equalizer output and the target response has 
been achieved here. 

The MATLAB program that has been used to obtain these results is available on an 
accompanying website. It is called iirdsgn.m. The reader is encouraged to run this 
program for other designs for enhancing his understanding of the concepts that were 
discussed earlier. 


10.5 Concluding Remarks 


In this chapter, we discussed the problem of IIR adaptive filtering. We noted that unlike 
FIR adaptive filters, whose adaptation is a rather straightforward task, adaptive adjustment 
of IIR filters is, in general, a complicated problem. IIR adaptive filters can easily become 
unstable as their poles may get shifted out of the unit circle by the adaptation process. Or, 
they can get trapped in one of the local minima points because the performance surfaces 
of IIR filters are, in general, multimodal. We saw that these problems could be resolved 
by either limiting ourselves to applications where special transfer functions with unimodal 
performance surfaces could be used, or using the method of equation error, which leads 
to suboptimal solution. 
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Figure 10.13 Performance of the IIR equalizers designed for different choices of the parameters: 
(a) 3 zeros and 4 poles, D = 2. (b) 5 zeros and 6 poles, D = 2. (c) 3 zeros and 4 poles, D = 2.5 
and (d) 5 zeros and 6 poles, D = 2.5. The three plots in each case correspond to the following 
choices of channel SNR: 25 dB (—); 30dB (---), 35dB (------ J 


We also presented two case studies, one for each of the above solutions. These studies 
showed some of the difficulties that one may encounter while dealing with IIR adaptive 
filters — problems that do not arise when FIR adaptive filters are used. In the first case 
study, we used a specific transfer function for realization of line enhancers. There were 
many considerations that we had to take note of before getting to our final solution. 
For instance, we saw that our initial transfer function gets into difficulties when the 
frequency of the sinusoid is close to 0 or m. For this, we found a specific solution, 
namely, replacement of the parameter w by cos@ and adapting 0 instead of w. We also 
had to take care of the parameter s of this structure in a very special way. In contrast 
to this, if we refer to Chapter 6 (Section 6.4.3) where we used an FIR filter to realize a 
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Figure 10.14 An example of the equalized dibit response of the magnetic recording channel for 
K =2, D = 2.5, t/T = 0.3, channel SNR = 30 dB, and an IIR equalizer with 5 zeros and 6 poles. 
The circles correspond to the target response samples. 


line enhancer, we find that none of the abovementioned kind of problems exist. The only 
point that we must consider while using an FIR filter is to include sufficient number of 
taps. As was noted in the beginning of this chapter, the main advantage of IIR adaptive 
filters, as compared with their FIR counterparts, is their lower order, which may lead to 
a lower computational complexity and hence reduction in the cost of implementation. 

The second case study that we discussed was equalization of magnetic recording chan- 
nels. In this application, we found that the optimum IIR equalizers are very sensitive to a 
delay parameter, t. The results indicated that varying this delay even around its optimum 
value can significantly affect the performance of the resulting equalizer (Figure 10.13). 
Similar study for FIR equalizers shows that they do not exhibit such a level of sensitivity. 
In fact, FIR equalizers are very robust in this respect. A study of this problem is left as 
an exercise for the reader. 

To conclude, our study in this chapter showed that although the IIR adaptive filters are 
attractive for some specific applications, it may not be possible to use them (directly) for 
any arbitrary application. This is unlike the FIR (transversal) adaptive filters, which are 
very versatile adaptive systems. While using IIR adaptive filters, special care has to be 
taken in the selection of transfer function and/or performance function depending upon 
the kind of application that we are dealing with. 

At this point, we shall add that much of the research work in IIR adaptive filters has 
been carried out in the context of system modeling. Literature on this topic is much wider 
than what we could cover within the limits of a single chapter of this book. An excellent 
paper by John J. Shynk (1989) provides a good review of the fundamental work done 
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in this subject. A good bibliography of the key references is also provided in this paper. 
Another interesting and classic reference is the book by Ljung and Söderström (1983). 


Problems 


P10.1 


P10.2 


P10.3 
P10.4 


P10.5 
P10.6 
P10.7 
P10.8 


P10.9 


Start with Eq. (10.4) and use Eqs. (10.11) and (10.12) to give detailed derivations 
of Eqs. (10.15) and (10.16), respectively. 


For the modeling problem that was discussed in Example 10.1, obtain the values 
Of dp, b1,o> A,e, and b] e for the cases where the power spectral density of the 
input, x(n), is given as: 


(i) 


p (e7?) = o 
"Z ~ |1—0.3e-/¢|? 
(ii) 
&,,(e/”) = ee 
si ~ |1 —0.8e-/¢|2 


From this study, you should find that when x(n) is colored, both ag, and b; e 
are biased with respect to the optimum Wiener coefficients ay, and b; ,. Explain 
how these biases are affected by the shape of ®,,.(e/”). 


Give a detailed derivation of Eq. (10.40). 


Consider the case where the input to the line enhancer of Figure 10.5 is the sum 
of a sinusoid and a white noise as in Eq. (10.39). 


(i) Show that 
—s 


a? l 1 
E 2 ER W jho 2 2 X 
[y< (n)] 5 Wer) +e, ies 
(ii) For a given value of 0, (say, 0, = 2/3), and a few values of s (say, s = 0.25, 
0.5, and 0.75) plot E[y?(n)] as a function of w and observe that E[y?(n)] 
has only one maximum and this is achieved when w = cos 0,- 


Give a detailed derivation of Eq. (10.43). 
Give a detailed derivation of the LMS algorithm of Table 10.2. 
Give a detailed derivation of the LMS algorithm of Table 10.3. 


In the light of the result of Problem P10.4, adjustment of the parameter w of 
the line enhancer of Figure 10.5 may be done by maximizing the mean-squared 
value of the output y(n). Develop an LMS algorithm, which works based on this 
principle. Also, develop another LMS algorithm that adapts @ = cos~'w instead 
of w, as in Algorithm 2 of Table 10.3. 


Give a formal proof of the fact that the performance function &,,(s, w) of Eq. 
(10.40), for a given s, has only one minimum point and that corresponds to the 
value of w = cos 6,- 
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P10.10 


P10.11 


P10.12 


P10.13 
P10.14 


P10.15 


Study the transfer function between the input, x(m), and output, y(n), of 
Figure 10.10 and show that this is that of a filter with L narrow bands. 


Study the transfer function between the input, x(n), and output error, e; (n), of 
Figure 10.10 and show that this is that of a filter with L notches. 


In line enhancers, signal enhancement is defined as the ratio of the SNR at the 
enhancer output, to the SNR at its input. For the IIR ALE that was discussed in 
Section 10.3 show that when x(n) is given by Eq. (10.39) 


1 
Signal enhancement of IIR ALE = i = 


—s 
Work out a detailed derivation of Eq. (10.59). 


For the magnetic recording channel, which is discussed in Section 10.4, show 
that the mean-squared value of the sum of residual ISI and noise at the equal- 
izer output is X`; (n; — y;)° + o2} ;w?. Thus, justify the use of definition Eq. 
(10.71). 


From our discussion in Section 10.2, we recall that the output error, e(n), and 
equation error, e'(n), are related through the transfer function 1 — B(z). Assum- 
ing that a relatively good estimate of B(z) (say, B(z)) is available, it is proposed 
that the setup of Figure P10.15 may be used to obtain better estimates of A(z) 
and B(z) compared with what could be achieved by the original equation error 
setup of Figure 10.4. Elaborate on this diagram and explain why this setup may 
give better estimates of A(z) and B(z), as compared with that in Figure 10.4. 
Also, develop an LMS algorithm for adaptation of A(z) and B(z) in this setup. 


plant noise, e,(n) 


Figure P10.15 
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Computer-Oriented Problems 


P10.16 


P10.17 


Develop programs for implementation of the LMS algorithms of Problem P10.8. 
For an input signal consisting of a sinusoid in additive white noise, as in Eq. 
(10.39), compare the convergence behavior of these algorithms with Algorithms 
1 and 2 of Tables 10.2 and 10.3, respectively. The MATLAB programs for 
Algorithms 1 and 2 are available on an accompanying website. As a bench 
mark for your comparisons, you may try to generate results similar to those in 
Figure 10.9. 


In magnetic recording, the best choice of the parameter K in the target response 
T(z) depends on the recording density, D. In this exercise we study how the 
choice of K varies with D. 

The program iirdsgn.m on an accompanying website allows you to design 
IIR equalizers in the application of magnetic recording. Different parameters 
of interest (such as recording density, channel SNR, and equalizer order) are 
inputs to the program. Use this program to design IIR equalizers for the densities 
D = 1.5 to 3, in steps of 0.25, and the choices of the parameter K = 1, 2, and 3. 
Assume a channel SNR of 30dB and an equalizer with 3 zeros and 4 poles, in 
all your designs. However, each design has to be optimized with respect to the 
delay, t. The criterion for the optimum design is detection SNR. Tabulate your 
results and discuss how the choice of K varies with D. 


11 


Lattice Filters 


In our discussions on FIR and IIR filters in the previous chapters, we always limited 
ourselves to implementation structures which were direct realization of their correspond- 
ing system functions. In this chapter, we introduce an alternative structure for realization 
of FIR and UR filters. This new structure, which is called lattice, has a number of desirable 
properties that will become clear as we go along in this chapter. The lattice structure has 
most commonly been used for implementing linear predictors in the context of speech pro- 
cessing applications. Predictors may appear in two distinct forms: forward and backward. 
In a forward linear predictor, the aim is to estimate the present sample of a signal x(n) in 
terms of a linear combination of its past samples x(n — 1), x(n — 2), ..., x(n — m). This 
corresponds to one-step forward prediction of order m. In backward linear prediction, on 
the other hand, an estimate of x(n — m) is obtained as a linear combination of the future 
samples x(n), x(n — 1),...,x(n—m + 1). 

In this chapter, we start with a study of forward and backward linear predictors. We 
find that these two are closely related to each other. In particular, we introduce the so- 
called order-update equations which mean an (m + 1)th-order linear prediction (forward 
or backward) of a signal sequence can be obtained as a linear combination of its mth-order 
forward and backward predictions. The order-update equations lead to a simple derivation 
of the lattice structure for forward and backward linear predictors. Other developments 
which follow this are the Levinson—Durbin algorithm (a computationally efficient proce- 
dure for solving Wiener—Hopf equations) and lattice structures for arbitrary FIR and IIR 
system functions. We also introduce the concept of autoregressive (AR) modeling of time 
series and use that for an efficient implementation of LMS—Newton algorithm. 

Our discussion in this chapter is limited to the case where the filter tap weights, input 
and desired output are real-valued. Extension of this to the case of complex-valued signals 
is straightforward and is deferred to the problems at the end of this chapter. 


11.1 Forward Linear Prediction 


Figure 11.1 depicts the direct implementation of an mth-order forward linear predictor. 
A transversal filter with tap-input vector x„(n — 1) = [x(n — 1) x(n — 2) -- -x(n — m)|" 
and tap-weight vector a,, = [an | 4m2 *** Am ml" is used to obtain an estimate of the input 


sample x(n). We use the subscript m in vectors a, and x„(n) and the elements of a, to 
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Figure 11.1 Forward linear predictor. 


emphasize that the predictor order is m. The implementation structure of the type shown 
in Figure 11.1 is called the transversal or tapped delay line predictor, in contrast to the 
lattice structure which will be introduced later. 

We assume that the input sequence, x(n), is the realization of a stationary stochastic 
process. Furthermore, we assume that the predictor tap weights are optimized in the 
mean-square sense according to the Wiener filter theory. Thus, the optimum value of the 


predictor tap weights d,, 1, 4m..-+++4m.m are obtained by minimizing the function 
Pr = EL fn ()] (11.1) 
where 
fa) = x(n) — îf (n) (11.2) 


is the forward prediction error and 


h(n) = > ay x(n — i) = ap Xp (0 — 1) (11.3) 


i=l 


is the mth-order forward prediction of the input sample x(n). This is a conventional 
Wiener filtering problem with the input vector x,, (7 — 1) and desired output x(n). Hence, 
the corresponding Wiener—Hopf equation is obtained by direct substitution of x(n) for 
d(n) and x,,(n — 1) for x(n) in Eqs. (3.10) and (3.11), and recalling Eq. (3.24). The 
result is 

Ra,, , =r (11.4) 
where R = E[x 
value of a- 


(n — Ix! (n — 1)], r = E[x()x,,(n — 1)] anda,, , denotes the optimum 


m m m,O 
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To simplify our notations in the discussion that follows, we assume that the predictor 
tap weights are always set to their optimum values and drop the extra subscript “o” from 
an. o- Thus, Eq. (11.4) is simply written as 


Ra,, =r (11.5) 


When the predictor tap weights are set according to Eq. (11.5), PË is minimized and this 
can be obtained using Eq. (3.26) as 


Pf = E[x*(n)] — r'a, 
= E[x?(n)]—r'Ro'r (11.6) 


assuming that R is nonsingular. 
For our later use, we define the autocorrelation function of the input process for lag k 
as 
r(k) = E[x(n)x(n — k)] (11.7) 


Using this definition, we note that 


r (0) r(1) -++ r(m—1) 
r(1) r(0) -++ r(m—2) 
= . ; 5 . (11.8) 
Hie 1) m 2) +s: r(0) 
and 
r(1) 
r(2) 
r= ; (11.9) 
r(m) 


We note that R and r, with the exception of r (0) and r (m), share the same set of elements. 
This very close relationship between R and r is the key to many interesting properties of 
the linear predictors which will be focused in this chapter. 


11.2 Backward Linear Prediction 


Figure 11.2 depicts an mth-order backward linear predictor. A transversal filter with 
tap-input vector x,,(n) = [x(n) x(n — 1)-- -x(n — m + 1)]" and tap-weight vector g,, = 
[2.1 8m.2°** 8m m] 1S used to obtain an estimate of the input sample x(n — m). As in the 
forward prediction case, we assume that the backward predictor tap weights are optimized 
in the mean-square sense according to the Wiener filter theory. The optimum value of the 
predictor tap weights 2,1, 8n.25++++&m.m are then obtained by minimizing the function 
P? = E[b2,(n)] (11.10) 


m 


where 
b,,(n) = x(n — m) — £°(n) (11.11) 


358 Adaptive Filters 


Figure 11.2 Backward linear predictor. 


is the backward prediction error and 


m 
ER) = È 8m ix(n — i + 1) = ByXm (7) (11.12) 


i=l 


is the mth-order backward prediction of the input sample x(n — m). This is a conventional 
Wiener filtering problem with the input vector x,, (n) and desired output x(n — m). Hence, 
the corresponding Wiener—Hopf equation is obtained by direct substitution of x(n — m) 
for d(n) and x,,(n) for x(n) in Eqs. (3.10) and (3.11), and recalling Eq. (3.24). The 
result is 

Rg, = r (11.13) 


where R = E[x,„ (n)xT, (n)], ry = E[x(n — m)x,, (n)]. 

Since x(n) is stationary, the correlation matrix R in Eq. (11.13) is the same matrix as 
that in Eq. (11.5). However, the vector r, on the right-hand side of Eq. (11.13) is different 
from the vector r in Eq. (11.5). Using the definition (11.7), we obtain 


r(m) 
=i 
r, = ER i (11.14) 


rd) 
Comparing Eqs. (11.14) and (11.8), we note that r, is same as the vector r with its 
elements arranged in the reverse order. 
When the tap weights of the backward predictor are optimized according to Eq. (11.13), 
Pp = Elx? (n — m)] — 158m 


= E[x?(n — m)]—r{ Rr, (11.15) 
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11.3 Relationship Between Forward and Backward Predictors 


We now show that there is a close relationship between the tap-weight vectors of the 
forward and backward linear predictors of a process x(n). To see this, we substitute Eqs. 
(11.8) and (11.9) in Eq. (11.5) and write the result in scalar form as 


m 


Yor = Dami =rG), for j=1,2,...,m (11.16) 


i=l 
where we have used the property r(@i — j) =r(j —i). Also, substitution of Eqs. (11.8) 
and (11.14) in to Eq. (11.13) gives 


m 


YorG = A8mi =m +1—j), for j=1,2,...,m (11.17) 
i=l 
Next, we let i = m + 1 — k and j = m + 1 — l in Eq. (11.17) anduser(k —1) = r (l — k), 


to obtain 
m 


XOrk-D8mmy =O, for l=1,2,...,m (11.18) 
k=l 


Replacing k and / in Eq. (11.18) by į and j, respectively, and comparing the result with 
Eq. (11.16), we get 
Ami = 8m m+- fOr i= 1,2,...,m (11.19) 


or 


Em i = am mli» fOr i= 1,2,...,m (11.20) 
This result shows that the optimum tap weights of the mth-order forward predictor of 
a wide sense stationary process x(n) are the same as the optimum tap weights of the 
corresponding backward predictor, but in the reverse order. Thus, we may write 


m 


fan) = x(n) — X anixi — i) (11.21) 
i=l 


and 
m 


Dy (n) = x(n =m) — È` am mpix (n — i+ 1) (11.22) 


i=l 


11.4 Prediction-Error Filters 


The forward predictor of Figure 11.1 uses an m-tap transversal filter to get an estimate 
of the present sample x(n) of a sequence based on its past m samples x(n — 1), x(n — 
2),...,x(n — m). The mth-order forward prediction-error filter for a sequence x(n) is 
defined as the filter whose input is x(n) and the forward prediction error f„(n) is its 
output. Figure 11.3 depicts a block schematic diagram showing how forward predictor 
and forward prediction-error filter are related. 


360 Adaptive Filters 


Forward prediction-error filter 


mth order 
forward predictor 


Figure 11.3 Block schematic diagram showing the relationship between forward predictor and 
forward prediction-error filter. 


Backward prediction-error filter 


mth order (n) 
backward predictor z B 


Figure 11.4 Block schematic diagram showing the relationship between backward predictor and 
backward prediction-error filter. 


Similarly, the mth-order backward prediction-error filter of a sequence x(n) is the one 
whose input is x(n) and its output is the backward prediction error b„(n). Figure 11.4 
depicts a block schematic diagram showing how backward predictor and backward 
prediction-error filter are related. 


11.5 Properties of Prediction Errors 


The forward and backward prediction errors possess certain properties that are funda- 
mental to the development of lattice structures. These properties are reviewed in this 
section. 


Property 1: For any sequence x(n), the forward and backward prediction errors of the 
same order have the same power. In other words, 


P? = pt (11.23) 


To show this, we note from Eq. (11.15), 


Pa = Ep n= m)] — ) 7 mir nm + 1-3) (11.24) 


i=l 
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Substituting Eq. (11.20) in Eq. (11.24) and noting that E[x?(n — m)] = E[x?(n)] as x(n) 
is stationary, we obtain 


m 


Pp = EK?) — È an mpira + 1 i) (11.25) 


i=l 


The proof of Eq. (11.23) is now complete since the right-hand side of Eq. (11.25) with 
j=m+1-—i is same as Pf given by Eq. (11.6). 

This result shows that, for a random process x(n), the forward and backward predic- 
tors achieve the same level of minimum mean-squared error, when their tap weights are 
optimized. Noting this, we drop the superscripts f and b from PË and P}, respectively, 
in the rest of this chapter. 


Property 2: For any sequence x(n) and its mth-order forward prediction error f,,(n) 
E[ fna (n)x(n — k)]=0, for k=1,2,...,m (11.26) 


This is easily proved by applying the principle of orthogonality to the forward predictor of 
Figure 11.1. Namely, the output error f,,, (7) is uncorrelated (orthogonal) with the samples 
x(n — 1), x(n — 2),...,x(n — m), at the filter (predictor) input. 


Property 3: For any sequence x(n) and its mth-order backward prediction error b,,(n) 
E[b,,(n)x(n —k)] =0, for k=0,1,...,m—1 (11.27) 


This is also proved by applying the principle of orthogonality to the backward predictor of 
Figure 11.1. Namely, the output error b,, (7) is uncorrelated (orthogonal) with the samples 
x(n), x(n — 1),..., x(n —m + 1), at the filter (predictor) input. 


Property 4: The backward prediction errors bo(n), b\(n),... of a sequence x(n) are 
always uncorrelated with one another. In other words, for any k # l, 


E[b,(n)b,(n)] = 0 (11.28) 


To show this, with no loss of generality, we assume that k < l and substitute for b,(n) 
from Eq. (11.22). This gives 


k 
E[b, (n)b)(n)] = E (xe —k)- So akeyi (N —i+ o) no 
i=l 
k-1 
= E[x(n — k)b(n)] — È` ag ps Elx (n — i))b,(n)] (11.29) 
i=0 
Using Property 3 and noting that k < /, one finds that all the expectations on the right-hand 
side of Eq. (11.29) are zero. This completes the proof. 
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11.6 Derivation of Lattice Structure 


In this section, we present a derivation of lattice structure for prediction-error filters. 
A distinct feature of lattice structure, as we will show in this section, is that it is a 
direct implementation of the order-update equations for computing mth-order forward 
and backward prediction errors from the forward and backward prediction errors of order 
m — 1. This is not possible in the transversal structure case. To derive these order-update 
equations and thereby the structure of lattice filters, we start with the forward prediction 
error for an (m + 1)th-order predictor: 


m+1 


Ín) = x(n) — DD am+1 i X(n — i) (11.30) 


i=1 
The summation on the right-hand side of Eq. (11.30) can be rearranged as 


m+1 m 
5 dnji x= i) = X ampiri — i) + 4y 41 m4ix(n —m— 1) (11.31) 


i=l i=l 


From Eq. (11.22), we get 


x(n=m-— 1) = bpn — 1) + D> am mpix — i) (11.32) 
i=l 


Substituting Eq. (11.32) in Eq. (11.31), we obtain 


m+1 m 
D Am+1iX(n =i)= X anyi + Am+1,m-+14m,m+1—i)* (a =i) 


i=l i=l 
+ am41,m+1bm =1) 


m 


=} apax (n — i) + KP (n — 1) (11.33) 
i=] 
where 
Km+1 = 4m+i,m+1 (11.34) 
and 
am i = Am41,i + Km414m,m+1—i> for i = 1,2,...,m (11.35) 


The above development shows that any linear combination of the input samples x(n — 
1), x(n — 2),...,x(n — m — 1) can also be obtained as a linear combination of x(n — 
1), x(n — 2),..., x(n — m) and b„(n— 1). We also note that the summation on the 
left-hand side of Eq. (11.33) is the (m + 1)th-order forward prediction of x(n), that is, 
Xn +10). We may thus argue that the estimate âf 4, (”) can also be obtained as a linear 
combination of the past m samples of x(n) and the backward prediction error b„(n — 1), 
that is, 


m 


Fei) =) Gig XCM — i) + Kn 41Bm n — 1) (11.36) 


i=l 
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Then, the coefficients a’, ,’s and k,,,, can be obtained directly by minimizing the mean- 


squared of the estimation error 
af 
Sn) = x(n) — Xm41(M) 


= x(n) — Soa), iX (0 — i) — kbp — 1) (11.37) 
i=l 


To proceed, we define the vectors 


z(n) = [xi (n — 1) b„(n — 1)]" (11.38) 
and 
w, = [ah Kmail” (11.39) 
where 
am = lan an2 ii -dfa ml” (11.40) 


and x„(n — 1) is as defined in Section 11.1. Using these definitions, Eq. (11.37) can be 
written as 
fng1(0) = x(n) — wTz(n) (11.41) 


Then, the tap-weight vector w, which minimizes f„+ı(⁄) in the mean-square sense can 
be obtained from the corresponding Wiener—Hopf equation 


RW: = Pe (11.42) 


where 
R, = Elz(n)z"(n)] (11.43) 


is the correlation matrix of the observation vector, z(n), and 
Px = Elx(n)z(n)] (11.44) 


is the cross-correlation between z(n) and the desired output, x(n). 
Substituting Eq. (11.38) in Eq. (11.43) and using the definition (11.7) and Property 3 
of prediction errors, that is, Eq. (11.27), we obtain 


r(0) r(1) -++ r(m—1) 0 
r(1) r(O) +--+ ~r(m—2) 0 
R; = : : e z (11.45) 
r(m—1) r(m— 2) --- r (0) 0 
0 0 nee 0 E[b} (n — 1)] 


We note that the m-by-m portion of the upper-left part of R,, is nothing but the correlation 
matrix R of Eq. (11.8). Thus, we may write 


R 0, 
Ra = [o E[b? (n — rm (11.46) 
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where 0,, denotes the length m zero column vector. Similarly, it is straightforward to 


show that 
r 
Px = P (n- wil ee 


where r is the column vector defined in Eq. (11.9). 
Substituting Eqs. (11.39), (11.46), and (11.47) in Eq. (11.42) and solving for x,,,.; and 


a'm, we obtain 
_ E[x(n)b,, (1 — 1)] 


Km+1 = E[b2(n — 1] (11.48) 
and 
a’, = Rtr (11.49) 
Comparing Eq. (11.49) with Eq. (11.5), one finds that 
an = apn (11.50) 
Substituting this result in Eq. (11.37), we get 
fng 0) = X(N) — a} Xm n — 1) = Ky Dy, (2 1) 
= fin) — Km419m(" — 1) (11.51) 


where use is made of Eq. (11.21). Thus, the (m + 1)th-order forward prediction error can 
be obtained from the mth-order forward and backward prediction errors. 

Following a similar procedure as above, a similar recursion for the backward prediction 
error can be derived. It is given by (Problem P11.1) 


bm410) = Dy (2 — 1) — Kins fnn) (11.52) 


where 
1 E[x(n =m — 1) fp (n)] 


oe E[f2(n)] 


(11.53) 


We now show that the two quantities given for x,,,, in Eq. (11.48) and «j; in Eq. 
(11.53) are the same. Consider Eq. (11.48). Using Eq. (11.21), we can write 


E[x(n)b,,(n— 1)] = E (rw +) ay x(n — D) b(n — J 


i=l 


= Elfin Mbn(n — 1)] 


m 


+Y ani Ele — ib, (n — 1] (11.54) 


i=l 


We note from Eq. (11.27), with n replaced by n — 1, that all the expectations under the 
summation on the right-hand side of Eq. (11.54) are 0. Thus, we obtain 


E[x(n)b,,(n — 1)] = EL f, (Wb, (n — 1)] (11.55) 
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Similarly, one can easily show that 
E[x(n—m—1)f,,)] = Elf, (Mb, (n — 1] (11.56) 


The results in Eqs. (11.55) and (11.56) show that the numerators of the two expressions 
on the right-hand sides of Eqs. (11.48) and (11.53) are the same. The denominators of 
these expressions are also the same, as E[ fe (n)] = Pt, E [b2 (n— 1) = E (be (n)] = PŁ, 
and according to Property 1 of prediction errors, Pf = P>. Thus, we have established 
that the quantities «,,,, in Eq. (11.48) and Katt in Eq. (11.53) are the same. This, of 
course, is true only when the predictors coefficients are optimum. 


We may also write 
_ Elfa bna D] 
VELFR ETB; (n — 1) 


Thus, «,,, is the normalized correlation between the forward and backward errors f(n) 
and b„(n — 1). In fact, «,,,, is known as the partial correlation (PARCOR) coefficient 
as it represents the correlation that remains between the forward and backward prediction 
errors. Using the Cauchy—Schwartz inequality! it is straightforward to show that the 
following inequality is always true: 


(11.57) 


m+1 


Kina = 1 (11.58) 


We may also write 
El fn) bmn — 1)] 


K = 
m+1 
Pn 


(11.59) 


since E[ fie (n))=E [b2 (n — 1)] = P„, according to Property 1 of prediction errors. 


Summarizing the above derived order-update equations for prediction errors, we have 


Ím (n) = Tea (n) TE Keim (n ~~ 1) (11.60) 
bm100) = bm (n — 1) = Km41 fmn) (11.61) 


where m = 0, 1,2, ..., and «,,,, is given by Eq. (11.57) or (11.59). Since x(n) may be 
considered as the zeroth order forward or backward prediction errors, the initialization 
for the above recursions is given by 


Jon) = bo(n) = x(n) (11.62) 


The structure that implements the above recursions is called the lattice filter/predictor. 
Figure 11.5a shows the lattice structure of an M-stage forward/backward predictor. 
Each stage has two inputs. These are the forward and backward prediction errors from 
the previous stage. The outputs of each stage are the forward and backward prediction 
errors of one order higher. These are calculated according to the order-update Eqs. (11.60) 


l The Cauchy—Schwartz inequality states that for any set of numbers {a; and b,, for i = 1,2,..., L}, 


EJE 


L 
Yo ab; 
i=l 
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peN bm(n) 
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Figure 11.5 Lattice predictor: (a) overall structure and (b) details of Stage m. 


and (11.61). The two inputs to Stage 1 are common and are equal to the predictor input 
x(n). Figure 11.5b depicts the details of the mth stage of the lattice predictor. It follows 
from the order-update equations (11.60) and (11.61). 

A special feature of the lattice predictor is that to obtain the Mth-order prediction errors, 
all prediction errors (forward and backward) of lower orders are also calculated. In other 
words, the Mth-order lattice predictor is a structure with a single input x(n) and 2M +2 
outputs fo(7), b(n), fin), biin), .... fyn) and by (n). How many of these outputs will 
be used is application dependent. For example, in an Mth-order forward predictor where 
the final goal is fọ(n), the rest of the prediction errors (with the exception of bj, (n) 
which, in this case, can be dropped from the structure) are required only as intermediate 
signal sequences. 

A useful relationship which we shall establish before ending this section is an order- 
update equation for the mean-squared value of the prediction errors. We note that 


Pr = Elia (n)] 


m+1 
=E fet (o = X appia x0 = »)| 


i=1 
m+1 


= El fm+1 0x (n)] T 5 am+1,i Elfm41 x(n = i)] (11.63) 


i=1 
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But from Property 2 of prediction errors we know that all the expectations under the 
summation on the right-hand side of Eq. (11.63) are zero. Thus we obtain 


Patt = El fng1@)x()] (11.64) 


Substituting for f,,,,(”) and x(n) in Eq. (11.64) from Eqs. (11.60) and (11.21), respec- 
tively, we get 


Piia =E in E Kimi inh = 1)) (ro T X apix(n = »)| 


i=l 
= Et f2(n)] = Kyi Elfin (Mb, (2 — 1)] (11.65) 


since E[f,,(n)x(n — i)] = E[b,,(n — 1)x(n — i)] = 0, for i = 1,2,...,m by Properties 
2 and 3 of prediction errors. Substituting for E[f,,(7)b,,(n — 1)] from Eq. (11.59) and 
noting that E[f2(n)] = P,,, we get 


Pay = 1 ia) Pp (11.66) 


This result shows that the mean-squared value of the prediction error decreases as the 
order of predictor increases. This, of course, is intuitively understandable. The contribution 
of each stage in reducing the prediction error is determined by its PARCOR coefficient, 
according to Eq. (11.66). A PARCOR coefficient with close to one magnitude reduces 
prediction error significantly. On the other hand, a PARCOR coefficient with small mag- 
nitude has little effect in improving (reducing) prediction error. Intuitively, we expect the 
prediction error to decrease rapidly for the first few stages and slowly for the later stages. 
This is equivalent to saying that the PARCOR coefficients are likely to be relatively larger 
(in magnitude) for the first few stages and drop to some values close to 0 at later stages. 


11.7 Lattice as an Orthogonalization Transform 


An important feature of the lattice predictor structure of Figure 11.5a, in the context of 
adaptive filters, is that it may be viewed as an orthogonalization transform. Furthermore, 
as we shall see later, the PARCOR coefficients which are central to the lattice structure 
can be obtained adaptively. So, the lattice predictor structure may be used for adaptive 
implementation of orthogonalization in the transform domain adaptive filters discussed in 
Chapter 7. 

Before looking at the adaptive techniques for implementation of such orthogonalization, 
let us assume that the optimum PARCOR coefficients of the lattice structure are known and 
the corresponding prediction errors can be calculated. We also define the column vector 


b(n) = [bo(n) by (n) by- 0)" (11.67) 


whose elements are the backward prediction errors of orders 0 to N — 1. The vector 
b(n) may be obtained through the lattice predictor of Figure 11.5a, with M = N — 1, or, 
equivalently, according to the equation 


b(n) = Lx(n) (11.68) 
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where 
x(n) = [x(n) m= 1j- N+ DF (11.69) 
and 
1 0 0 0 0 
—4; ı 1 0 0 0 
L=| —2 =a 1 san 0 0 (11.70) 
tga wt Oy No Gn ines 7 Tay, l 


with a; ; denoting the jth coefficient of the ith-order forward predictor. We note that the 
matrix L is invertible, since det(L) 4 0 (Problem P11.3). Hence, Eq. (11.68) can also be 
written as 

x(n) = L7'b(n) (11.71) 


Figure 11.6 depicts a block schematic obtained from Eq. (11.68). It consists of N 
prediction-error filters of orders 0 to N — 1, in parallel. Compared to the lattice struc- 
ture, this suggests a more direct way of converting (transforming) the input vector x(n) 
to the backward prediction errors bọ(n), b\(n),..., by_,(). It also resembles the idea 
of transformation in the context of the transform domain adaptive filters discussed in 
Chapter 7. However, it requires more computations compared to the lattice predictor. The 
lattice predictor requires only 2N multiplications and 2Nadditions/subtractions for each 
updating all the backward prediction errors once, while the computational complexity of 
the direct implementation presented in Figure 11.6 is about N?/2 multiplications and sim- 
ilar number of additions/subtractions for every input sample. However, in the following 
discussions we use Eq. (11.68) as an expression for the vector b(n), because it will help 
in developing certain theoretical results. In actual implementation, of course, one can use 
the corresponding lattice structure. 


backward prediction-error bo(n) 
filter of order 0 


backward prediction-error bi(n) 


filter of order 1 


backward prediction-error 
filter of order N — 1 


Figure 11.6 Block schematic diagram for a direct implementation of Eq. (11.68). 
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11.8 Lattice Joint Process Estimator 


In the previous sections, our discussion on lattice structure was limited to its use as a 
prediction-error filter. In this section, we show how a general transversal filter, which 
is used to estimate a desired sequence d(n) from another related sequence x(n), can be 
implemented using lattice structure. 
Consider a transversal filter with tap-input vector x(n), as in Eq. (11.69), tap-weight 
vector 
w= [wọ w; wy] (11.72) 


output 
y(n) = w'x(n) (11.73) 


and desired output d(n). Substituting Eq. (11.71) in Eq. (11.73), we obtain 
y(n) = w'L b(n) (11.74) 


We define the column vector 
c=L Tw (11.75) 


where L7" is shorthand notation for (L')~! or (L~!)" (note that (L)! = (L~!)"). Then, 
Eq. (11.74) simplifies as 
y(n) = e' b(n) (11.76) 


This result show that the output y(n) of the transversal filter can equivalently be obtained 
as a linear combination of the backward prediction errors. This also suggests an alternative 
structure for implementation of the system function of a transversal filter. Figure 11.7 
depicts such an implementation. This is referred to as lattice joint process estimator. 


Figure 11.7 Lattice joint process estimator. 
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It consists of two distinct parts: the lattice predictor part and the linear combiner part. 
The lattice predictor part is used to convert the samples of input signal to backward 
prediction errors. The linear combiner part uses the backward prediction errors to obtain 
the filter output according to Eq. (11.76). 

We note that once the backward predictor coefficients are known, the coefficients of 
the linear combiner part of the lattice joint process estimator are uniquely determined 
through Eq. (11.75), provided the tap-weight vector w of the corresponding transversal 
filter is known. Furthermore, the existence of ¢ is guaranteed as L is invertible (Problem 
P11.3). The optimum values of the PARCOR coefficients in the lattice part of Figure 11.7 
are determined from the statistics of the input sequence, x(n). 


11.9 System Functions 


In this section, we present a system function view of lattice structure. This will be useful 
for our analyses in the following sections. We define Hy (z) and H), (z) as the trans- 
fer functions relating the input sequence x(n) and the mth-order forward and backward 
prediction errors f„(n) and b,,(n), respectively. Then, it follows from Eqs. (11.60) and 
(11.61) that 


m 


H fayi (z) = Hp, (z) E Km412 Hp (z) a 1.77) 


m 


and 


H, @) = z7 Hp, @) — Km Ay, @ (11.78) 


m+ 


These are order-update equations which may be used to obtain the system functions of 
forward and backward prediction-error filters of any order in terms of the system functions 
of one order lower prediction-error filters. The initial conditions to start these order-update 
equations are H(z) = H,,(z) = 1. 

We also note that the system functions H, (z) and H, (z) may directly be realized 


using the expressions 
m 


Hp =l- > aniz (11.79) 
i=1 


and 
m 


A, z) = c= ioe (11.80) 
i=l 
which follow directly from Eqs. (11.21) and (11.22), respectively. Also, for our later 
reference, we note that Hy (z) and H), (z) are related according to the equation 


i, 2) =z "H; 7) (11.81) 


11.10 Conversions 


From the results in the previous sections, and in particular the system functions presen- 
tation in the last section, one may conclude that there is a close relationship between 
the PARCOR coefficients, «,,’s, and the transversal predictor coefficients, a,, ; `s. In this 
section, we present procedures for conversion between these two sets of coefficients. 
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11.10.1 Conversion Between Lattice and Transversal Predictors 


Given the PARCOR coefficients of a lattice predictor, the coefficients of the corresponding 
transversal structure can be calculated. This follows from the order-update equations 
(11.60) and (11.61) or (11.77) and (11.78). It is done by starting with the initial condition 
(system functions) H,,(z) = H, (z) = 1 and iterating Eqs. (11.77) and (11.78) until the 
required order is reached. In particular, substituting Eqs. (11.79) and (11.80) in Eq. (11.77), 
we get 


m+1 m 


m 
-i _ —i —1 —m —i+l 
i= ` ee =1- X Amiz — Km 41% (: T ) Gn m4+1—i% (11.82) 
i=l i=l 


i=l 
Rearranging this, we obtain 


m+1 m 

—i _ —i —m-—1 
X Ams 14% = X (Gni = Kin+1%n,m+1—i)Z + Km4+1% (11.83) 
i=l i=l 


Equating the coefficients of similar powers of z on both sides of Eq. (11.83), we get 
Am+1,i = m,i ~ Km4+14m,m+1-i> for i=1,2,...,m (11.84) 


and 


Am+1,m+1 = Km+1 (11.85) 


In order to obtain the coefficient of an Mth-order transversal predictor, we shall start with 
the initial condition ap = 1 (equivalent to H ș (z2) = H,,(z) = 1) and iterate Eqs. (11.84) 
and (11.85) M times. Table 11.1 summarizes this procedure. 

Next, we derive a procedure for calculating the PARCOR coefficients k),k>,...,Ky 
from the coefficients ay 1, 4y,2,--+»@y,y Of an Mth-order transversal predictor. Consider 
Eq. (11.84) for a particular value of i and also when i is replaced by m + 1 — i. We get 
the following pair of simultaneous equations: 

a 


— Km414 =a 


m,i m,m+1—i 


mtl,i (11.86) 


Am,m+1—i — Kn+14ni = Am+1,m+1—i 


Table 11.1 Conversion from lattice to transversal predictor. 


Given: Ki, K3, ..., K4 

Required: am 18m2: -MM 

G= Ki 

for m = 1 to M — 1 
Am+l,i = Qn i ~ Km+1%m,m+1-i? for i= I, 2, see M 
Am+1,m+1 = Km+1 
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Table 11.2 Conversion from transversal to lattice predictor 
(inverse Levinson—Durbin algorithm). 


Given: ay 14M,- -> AMM 
Required: K1, K3, ..., Km 
Km = îM,M 


for m = M — 1, M —2,...,1 


— am+litKm+14m+1,m+1—i > 
mi = E , fori=1,2,...,m 
m+1 


a 


Kin A Gn sm 


end 


Solving these for a, ;, we get 


m,i? 


Anali + Km41Gn+1 1-i . 
RE m+l,i m+1@m+1,m+ E, for i=l, 2 eam (11.87) 
m,i 1 2 
—Km+ 


This with Eq. (11.85) suggest the procedure presented in Table 11.2 for calculating the 
PARCOR coefficients from the coefficients of the corresponding transversal predictor. This 
procedure is known as inverse Levinson—Durbin algorithm. It may be noted that although 
we are only interested in the PARCOR coefficients, the coefficients of the transversal filters 
of orders m = 1,2,..., M — 1 are also obtained as intermediate results. Thus, given the 
coefficients of an Mth-order transversal predictor, the coefficients of the lower order 
transversal predictors are obtained by following the procedure provided in Table 11.2. In 
other words, given the last row of the matrix L of Eq. (11.70), for M = N — 1, one can 
build the whole matrix L by following the inverse Levinson—Durbin algorithm. 


11.10.2 Levinson—Durbin Algorithm 


From Eq. (11.5), we note that the coefficients a, ;’s of a transversal predictor are directly 
related to the autocorrelation function of its input. The well-known Levinson—Durbin 
algorithm is a computationally efficient procedure for solving the Wiener—Hopf equation 
(11.5) of the transversal predictor. It also provides the PARCOR coefficients of the cor- 
responding lattice predictor. The efficiency is achieved by exploiting the fact that the 
input x(n) is a stationary process. This will be clarified further at the end of this section. 
With the background that we have already developed, derivation of the Levinson—Durbin 
algorithm is straightforward. We note from Eq. (11.59) 


Knai on = El fn @)bm (n — 1)] 


=E [mo (ro SMT 1) ~ X an myx ~~ »)| 


i=l 


= Elf (n)x(n —m — 1)] (11.88) 
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where the last equality follows from the identity E[f,,(n)x(n —i)]=0, for i= 
1,2,...,m. Substituting Eq. (11.21) in Eq. (11.88), we obtain 


Km41 Pm =E (x ~ X anixi = D) x(n —m— J 


i=l 


m 


=r(m+1)— X apir(m 4 =i) (11.89) 
i=l 
i (m +1) = 0% ay rm +1 =i) 
r(m — J `; Gn rm —i 
Kml = = & (11.90) 
m 
Thus, given the autocorrelation coefficients r(1),r(2),...,r(m + 1), the mth-order 


transversal predictor coefficients a,, ;’s, and P,,, one can calculate «,,,,; according to 
Eq. (11.90). The order-update equations (11.84) are then used to obtain the coefficients 
of the (m + 1)th-order transversal predictor, am+1 ; $- Equation (11.66) is used to 
obtain P,,,,; for the next iteration. Table 11.3 summarizes these results. This is called 
Levinson—Durbin algorithm. The most important feature of the Levinson—Durbin 
algorithm is its computational efficiency. Careful examination of Table 11.3 shows 
that the implementation of Levinson—Durbin algorithm requires about M? multiplica- 
tions/divisions and the same number of additions/subtractions, where M is the order of 
predictor. This must be compared with M? which is the order of computations required 
for solving a system of M linear equations without exploiting the structure in the 
system. The special structure that is exploited here is the symmetric Toeplitz nature of 
the autocorrelation matrix R. By definition, the autocorrelation matrix of any process 
(stationary or nonstationary) is symmetric. But, if the process x(n) is stationary (at 
least wide sense), then the autocorrelation matrix becomes Toeplitz, in addition to being 
symmetric. That is, all the elements along any given diagonal are the same. For example, 
the kth subdiagonal and super-diagonal will be constituted by the autocorrelation at lag k. 


Table 11.3 Levinson—Durbin algorithm. 


Given: r(0),r(1),...,r(M) 


Required: dy 1,4y2,---,4y.u 
and K4, K3, <- -, Ky 

Py) =r (0) 

kK =r(1)/P 

api = ki 


P =(1—k7)Py 
for m = 1 to M — 1 
r(m+1)—};_] am ir(m+1—i) 


Km+1 = Py 
Am+l,i = Am,i ~ Km+1%n,m+1-i? for i = 1, 2, s’ M 
Am+1,m+1 = Km+1 


Pasi = ad = Kaa) a 


I 
end 
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In fact, all the results that we have derived in this chapter are under this stationarity 
assumption on input x(n). 

In the above two sections, we have derived procedures for (i) conversion between the 
coefficients of lattice and transversal predictors and (ii) obtaining the coefficients of the 
lattice and transversal predictors from the autocorrelation values of the input process. 
Furthermore, given the power of the input sequence x(n), that is, Py = r (0) = E [x?(n)], 
and the PARCOR coefficients K4, K2, ..., Ky, one can develop a procedure to obtain the 
autocorrelation coefficients r(1),r(2),..., 7(M) (Problem P11.6). The latter coefficients 
can also be obtained if Py and the coefficients ay ,,ay.2,.--,@y.y Of an Mth-order 
transversal predictor are available (Problem P11.6). All these possible conversions show 
that the three sets of coefficients (Po, K1, K2,- ., Km), (Po, 4.1, 4M2» -- -am m) and 
(r(0), r(1), ..., 7(M)) are three different representations of the same information. When 
M tends to infinity, these may be thought as an alternative representation of the power 
spectral density of the input process x(n). 


11.10.3 Extension of Levinson—Durbin Algorithm 


The solution provided by Levinson—Durbin algorithm is only applicable to the case where 
the Wiener—Hopf equation to be solved corresponds to a predictor. In this section, the 
Levinson—Durbin algorithm is extended to the case of joint process estimator, as to 
handle the general case of estimating a signal from another related signal. Consider the 
Wiener—Hopf equation 

Rw =p (11.91) 


where R = E[x(n)x'(n)], x(n) is the input vector as defined in Eq. (11.69), 
p = E[x(n)d(n)], and d(n) is the desired output of the estimator. 
The solution to Eq. (11.91) consists of three steps: 


Step 1: The conventional Levinson—Durbin algorithm of Table 11.3 is used to obtain the 
elements of the matrix L of Eq. (11.70). The PARCOR coefficients k,,k2,..., 
Ky _, Of the lattice predictor and the mean-squared values of the prediction errors, 
that is, Po, P,,..., Py_,, are also obtained in this process. 

Step 2: The Wiener—Hopf equation corresponding to the linear combiner part of the lat- 
tice joint process estimator is built and solved. This gives the coefficient vector c. 

Step 3: The tap-weight vector w is obtained according to the equation 


w=L'e (11.92) 
This is obtained by premultiplying Eq. (11.75) on both side by LT. 
The Wiener—Hopf equation for ¢ is (Figure 11.7) 


Rype = Pap (11.93) 


where R,, = E[b(n)b!(n)] and Pap = Eld(n)b(n)]. Property 4 of prediction errors 
implies that the correlation matrix R,, is diagonal. Furthermore, the diagonal 
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elements of R,, are the mean-squared values of the backward prediction errors 
bo(n), b,(n), ESN by-ı(n), that is, Po; Py, PEES Py_- Hence, 


R,, = diag( P9, P,,..-, Py_4) (11.94) 
Also, the mth element of pyp is 


Pab (m) = E[d(n)b,, (n)] 


m—1 
=E fa (se —m) — X Gen ae = »)| 


i=0 


m-1 


= p(m) — È am, m-i PÙ) (11.95) 


i=0 


where p(i) = E[d(n)x(n — i)] is the ith element of the vector p. Substituting Eqs. (11.94) 
and (11.95) in Eq. (11.93), we obtain 


p0) 
PO) m=0 
Po’ 
f= m—1 . (11 96) 
m p(m)— 2 am m-i pCi) 
isd m= 1,2,...,N— 1 


> 
Pm 


Table 11.4 Extended Levinson—Durbin algorithm. 


Given: R and p 
Required: w= R~!p 


Py) =r (0) 
— pO 
0 ~~ Po 
Wy = Co 
ki =r(1)/Po 
a1 =k 


P, =(1—k7) Po 
— PpU)=ai,1 pO) 


t= Pi 
Wo = Co — 41, 1C4 
Ww, = C1 


form =1to N—2 
— rQmtl) =e" | amit (m+1—i) 


Km+1 = Pm 
am+l,i = m,i T Km4+14n,m+1-i? for i = 1,2,...,m 
Am+1,m+1 = Km+1 


= a ee 
Pratl a a Knl) Pm 
c = REMADE amt mti PO 
m+1 Pm+1 
W; = W; — Am+1,m+1—ifm+1 for i =0,1,...,m 


Wmn+i = m+ 
end 
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Table 11.4 summarizes the above results. The recursion for the transversal coefficients 
w; follows from Eq. (11.92) (Problem P11.7). Here, the three steps listed above are 
combined together under a single “for loop.” Careful examination of Table 11.4 shows 
that the computational complexity of the extended Levinson—Durbin algorithm is about 
2N? multiplications/divisions and similar number of additions/subtractions. If the joint 
process estimator is to be implemented in lattice form, then the computations reduce to 
1.5N7, as Step 3 of the algorithm can then be ignored. 


11.11 All-Pole Lattice Structure 


The system functions that we have considered so far are all in the form of all-zero filters. 
In this section, we propose a lattice structure for implementation of an all-pole filter which 
is characterized by the system function 


1 
H yQ) 1a Dia ay iz 


This choice of F(z) is not restrictive except that it should be the system function of a 
stable system. This is the consequence of an important result of the theory of lattice filters 
which states a forward prediction-error filter is always minimum phase. This result, proof 
of which is beyond the scope of our discussion in this book, implies that the zeros of the 
system function H+„(z) of any prediction-error filter are all less than one in magnitude. 
It can also be shown that for any arbitrary minimum phase system function H,,,(z), one 
can always find a process whose forward prediction-error filter is H p, (z). Since H+) 
is a prediction-error filter, if we excite F(z) = 1/H,,,(z) with the Mth-order forward 
prediction error, fọ(n), of x(n), the output will be the original process x(n). In other 
words, the system function which relates fọ(n) and x(n) is F(z). With this in view, we 
recall the order-update equations (11.60) and (11.61) and rearrange (11.60) as 


Ím (n) — Fini @) + Km41bm (n ~~ 1) (11.98) 


Considering Eqs. (11.98) and (11.61), for values of m = 0, 1,..., M — 1, one can suggest 
the block diagram of Figure 11.8a. The detail of the mth stage of this block diagram is 
given in Figure 11.8b. The input sequence in Figure 11.8a is fọ(n) and the generated 
output is fọ(n) = x(n). We also recall that by(n) = x(n). From our discussion above, this 
observation implies that the block diagram of Figure 11.8a is the lattice realization of 
F(z). 

We shall comment that although in the development of the lattice structure of the all- 
pole system function F(z) we used f,,(n) as the input, the choice of the input to the 
latter structure is not limited to fy (n). It can be any arbitrary input. 


F(Z) = (11.97) 


11.12 Pole-Zero Lattice Structure 


In this section, we extend the all-pole lattice structure of the last section to an arbitrary 
system function G(z) with M zeros and M poles. With no loss of generality, we let 


paar wz! 


G(z) = 
(Z) Hy © 


(11.99) 
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Figure 11.8 Lattice all-pole filter: (a) overall structure and (b) details of one stage. 


We note that the denominator of G (z) is assumed to be the system function of a prediction- 
error filter. This, as was noted in the last section, is not restrictive, as this condition only 
limits the poles of G(z) to remain within the unit circle in the z-plane. In other words, 
the condition imposed on G (z) is just to guarantee its stability. 

To develop a lattice structure for G(z), we first rearrange Eq. (11.75) as 


w= L'e (11.100) 


Next, we define z = [1 z7! z7? -.z70D]", where z is the z-domain complex vari- 


able, multiply the transpose of both sides of Eq. (11.100) from right by z and replace 
N — 1 by M, to obtain 


M 
We) = > Hy, @) (11.101) 
i=0 
where c,’s are the elements of vector c, 
M 
We) =o wiz (11.102) 
i=0 


and H,,(z) is defined as in Eq. (11.80). Equation (11.101) shows that any arbitrary order 
M FIR system function W(z) can equivalently be realized as a linear combination of the 
backward prediction-error filter system functions H, (z), H,,(Z),.--, Hp, ()- 
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Ki 


Figure 11.9 Pole-zero lattice for an arbitrary system function. 


Using Eqs. (11.101) and (11.102), we obtain 


M 
Hy (z) 
G(z) = ci — (11.103) 
2 H py) 


Furthermore, with reference to Figure 11.8a, we note that H, (z)/H,,,(z) is the transfer 
function relating f,,(n) and b;(n). It is obtained as the cascade of the transfer function 
between fy(n) and x(n), that is, 1/ H y 2), and the transfer function between x(n) and 
b;(n), that is, H, (z). Using these results, we obtain Figure 11.9 as lattice realization of 
the system function G (z). 


11.13 Adaptive Lattice Filter 


In this section, an LMS algorithm for adaptive adjustment of the parameters of the lattice 
joint process estimator, given in Figure 11.7, is developed. Simulation results and some 
discussions on the performance of adaptive lattice filters are provided in the next section. 

As the prediction error power becomes minimum when the predictor coefficients are 
chosen optimally, the optimum PARCOR coefficient «,, of the mth stage of a lattice 
predictor is obtained by minimizing the cost function 


m 


Em = El fa(n) + b;,(n)] (11.104) 


The cost function &, „ is equivalent to either of the cost functions Pi = E[ f2 (n)] and 
Pb = E[b? (n)], since forward and backward predictors of the same order share the same 


set of coefficients and also the same level of minimum mean-squared error. By defining 
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the cost function as in Eq. (11.104) help us to use both forward and backward prediction 
errors in the LMS algorithm, so that a lower misadjustment can be achieved. 

The LMS algorithm for minimization of the cost function £, „ is implemented according 
to the recursive equation 


p,m 


A 


dE p m(n) 
Kp (n + 1) = Ky (A) = hp m n) E (11.105) 
Kn 
where Mp m(n) is the algorithm step-size and 
E, mn) = fin) + bp (n) (11.106) 


is an estimate of the cost function &, ,, based on the most recent samples of the forward 
and backward prediction errors. Substituting Eq. (11.106) in Eq. (11.105) and using Eqs. 
(11.60) and (11.61), we obtain 


Km (a + 1) = Km (N) + 2U p mM Sin bm- 1) + bn n) fmn- 7) (11.107) 


To assure fast convergence of the algorithm, the step-size Mp m(n) is normalized by the 
signal power at the input to the mth stage of the predictor. To estimate this power, we 
use the recursive equation 


P,a) = BP, — 1) +0.50 — pF) + B2_1 (2-1) (11.108) 


The normalized step-size parameter is then given by 


Mp.o 


a ae (11.109) 
Fai Gh) +e 


Up mn) = 


where Up o is an unnormalized step-size parameter common to all stages of the predictor, 
and € is a small positive constant which is added to prevent instability of the algorithm 
when P„-1(7) assumes values close to 0. 

The step-normalized LMS algorithm is also used for adaptation of c; coefficients of the 
linear combiner part of the lattice joint process estimator. The derivation of this procedure 
is the same as the recursions developed in Chapter 7. The result is 


e(n + 1) = e(n) + 2p,e(n) b(n) (11.110) 


where e(n) = d(n) — y(n), y(n) is obtained according to Eq. (11.76), and mw, is a diagonal 
matrix consisting the normalized step-size parameters 


Meo 


———., for m=0,1,...,N-1 (11.111) 
Pan) + € 


Kem = 


where He is an unnormalized step-size which, in general, may be different from p, o- 
Table 11.5 gives a summary of the above results. 
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Table 11.5 LMS algorithm for adaptive lattice joint process estimator. 


Given: Estimator parameters: 
k(n), Ky(N) +++ ky) 
and e(n) = [cg(n) c (n) -+> cy], 
the most recent input sample x(n), desired output d (n), 
backward prediction error vector 
b(n — 1) = [bọ(n — 1) b (n — 1)---by mn- py, 
and power estimates Py(n — 1), P;(n—1),..., Py — 1). 
Required: Estimator parameter updates: 
koln + 1), an+ )),...,ky_ja+)) 
and e(n + 1) = [ceon + 1) c (n+ 1)...cy + DIT, 
backward prediction error vector 
b(n) = [bo (n) by (n) bya T, 
and power estimates P(n), Pin), ..., Py- 0). 


xxx Lattice Predictor Part xxx 


fon) = b(n) = x(n) 
P(n) = BPo(n — 1) + 0.501 — BIL FE (n) + biin — 1)] 
for m = 1 to N— 1 
fnn) = fn- 0) = Ky, Mb, — 1) 
ban) = b,, (1 — 1) Kn (1) fna ™) 
Knn + D = Ky (2) H pO fn bn A) + By = VD) Sn I 
P,,(n) = BP,,(n — 1) +0.50 — B)[f2(n) + B2 (n — 1)] 


end 


xxx Linear Combiner Part xxx 


y(n) = e'(n)yb(n) 

e(n) = d(n) — y(n) 

He = He odiag((Po(n) + €)! (Pn) +), ..., (Py) +71) 
e(n + 1) = e(n) + 2p,e(n) b(n) 


11.13.1 Discussion and Simulations 


Analysis of the convergence behavior of the LMS algorithm when applied to a lattice 
structure is rather difficult. In particular, in the case of lattice joint process estimator there 
are two sets of parameters which are being adapted simultaneously. The optimum values 
of the PARCOR coefficients, «,,,’s, depend only on the statistics of the input signal. The 
optimum value of the coefficient vector c of the linear combiner part depends on the 
current values of PARCOR coefficients as well as the optimum value of the coefficient 
vector w in the original transversal filter, w,, viz., Eq. (11.75). An important point to 
be noted is that even if w,, that is, the optimum impulse response to be realized by the 
estimator, is fixed, any change in the PARCOR coefficients will require readjustment of 
the coefficient vector c. This, as we will demonstrate by a simulation example, may lead 
to a significant increase in misadjustment. 
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Figure 11.10 A modeling problem. 


To demonstrate the above phenomenon, we consider the modeling problem shown in 
Figure 11.10. A plant W,(z) is to be modeled by an adaptive filter W(z). The common 
input signal to the plant and adaptive filter, x(n), is generated by passing a unit variance 
white Gaussian process, v(n), through a coloring filter with the system function 


1 — 1.2z7! 
1 — 1.2z-! + 0.877? 
The coefficient K, is set equal to 0.488. This results in a sequence with unit variance. For 


our later use, the colored process generated by H(z) will be called x,(n). Figure 11.11 
shows the power spectral density of x, (1) evaluated using 


H(z) = Ky (11.112) 


®, (e2) = ©, (e/”)| A, (e/”) |" (11.113) 


X{X] ( 
with ®,,,(e/”) = 1. Observe that x; (n) is highly colored and the eigenvalue spread of its 
corresponding correlation matrix can be as large as 338. This is obtained as the ratio of 
maximum to minimum value of the spectral density — see Chapter 4. 

In this simulation example, we consider realizing the adaptive filter W(z) in transversal 
as well as lattice (i.e., the lattice joint process estimator of Figure 11.7) forms. The plant 
and the adaptive filter are both selected to have a length of N = 30. This choice of N 
for the present input sequence results in an eigenvalue spread of 300. The variance of the 
additive white noise sequence e,(n) at the plant output is set equal to oe. = 107+. So, 
when the adaptive filter W(z) is set to its optimum choice, W,(z), the resulting minimum 
mean-squared error will be ož = 107+. 

Figure 11.12 presents a pair of learning curves of the modeling problem. The curves 
correspond to transversal and lattice LMS, and each is an ensemble average of 50 inde- 
pendent runs. The final curves have been smoothed. In the case of transversal LMS, the 
step-size parameter, ju, is selected according to Eq. (6.64) to result in 10% misadjustment. 
For lattice LMS, the following parameters are used: € = 0.02, M, o = 0.001 (for the first 
2500 iterations only) and ue o = 0.1/N = 0.0033. This choice of u, „ would result in a 
misadjustment of about 10% if the PARCOR coefficients perturbation, which arise due to 
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Figure 11.11 Power spectral density of the input process for the modeling problem simulations. 
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Figure 11.12 Learning curves showing convergence of the transversal and lattice LMS applied 
to modeling problem of Figure 11.10. 
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their adaptation, could be ignored. To demonstrate the effect of the perturbation of PAR- 
COR coefficients, y Do is forced to zero after the first 2500 iterations, so that the PARCOR 
coefficients remain fixed from iteration 2500. Observe that the perturbation of PARCOR 
coefficients has significant impact on the misadjustment of the lattice LMS algorithm. 
Once the PARCOR coefficients adjustment is stopped, we see a fast convergence of the 
algorithm with a misadjustment close to what we predicted before. At iteration 2500, the 
PARCOR coefficients are already near their optimal values and the backward prediction 
errors are almost uncorrelated with one another. This is why the lattice LMS converges 
faster than the transversal LMS, after iteration 2500. 

The problem of PARCOR coefficients perturbation is a serious one which limits the 
application of adaptive lattice joint process estimator. As the above example demonstrated, 
unless the PARCOR coefficients adjustment is stopped after some initial convergence, the 
lattice LMS cannot be relied on as a good choice for improving convergence performance 
of adaptive filters. The large misadjustment arising from adaptation of PARCOR coeffi- 
cients prohibit their applicability. The problem may be more serious when the input signal 
is nonstationary. In that case, the optimum PARCOR coefficients are time-varying, as they 
follow the time-varying statistics of the input. This, in turn, necessitates continuous adap- 
tation of the PARCOR coefficients as well as the coefficient vector c. The inevitable lag 
in the adaptation of e will result in further increase of the mean-squared error. This in 
effect means higher misadjustment. 


11.14 Autoregressive Modeling of Random Processes 


A random process x(n) is said to be AR of order M if it can be generated through a 
difference equation of the form 


M 
x(n) = $ h;x(n — i) + va) (11.114) 


i=l 


where h;’s are AR coefficients and v (n) is a zero-mean white noise process which is called 
innovation of x(n). This implies that any new sample of the process, x(n), is related to its 
previous M samples according to the summation on the right-hand side of Eq. (11.114). In 
addition, there is a new piece of information (namely, the innovation v(n)) in x(n) which 
is uncorrelated to its previous samples. This, in turn, implies that the best linear prediction 
of x(n) based on its past M samples is nothing but the summation on the right-hand side 
of Eq. (11.114). Moreover, the latter estimate cannot be improved by increasing the order 
of the predictor beyond M, as the portion of x(n) which could not be estimated by the 
latter summation, that is, v(m), has no correlation with the farther samples of x(n), that is, 
x(n — M —1),x(n — M —2),... A procedure for analytical derivation of these results is 
discussed in Problem P11.26. 

An AR process x(n) can be characterized by its model which may be obtained by 
passing x(n) through a forward (or backward) linear predictor and optimizing the predictor 
coefficients by minimizing the mean-squared error of its output. This results in a set of 
predictor coefficients which match the coefficients h; of Eq. (11.114). In particular, if a 
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predictor of order M’ > M is used, we obtain 


h; 1<i<M 
Gip: = na T = 11.115 
Mi 10, M+1<i< M ( ) 


and 
Pw = Py = o? (11.116) 


where o2 is the variance of v(n). An important point to be noted here is that the set 
of coefficients dy 1, 4m2» ---, 4m, my and Py provide sufficient information to obtain the 
autocorrelation function of x(n) for any arbitrary lag. This directly follows from Eq. 
(11.114) with h; = ay ;. To see this, multiplying Eq. (11.114) on both sides by x(n — k) 
and taking expectations, we obtain 


M 
r(k) = Do ayirk—i), k>0 (11.117) 


i=l 


Value of r (0) can be obtained using Eq. (11.66) as 


M -1 
r(0) = (Te 3) Pi (11.118) 
i=l 


where the PARCOR coefficients can be obtained using the inverse Levinson—Durbin 
algorithm discussed in Section 11.10. 

It may also be noted that the estimated AR coefficients may be used to obtain the power 
spectral density of x(n) according to the following equation 


®,,(e/?) = Pyl Hrg lei)? (11.119) 
where it has been noted that ®,,,(e/°) = o? = Py, and 


1 
Hyp(z) = (11.120) 


M a, 
1) 14m iz 


It is also instructive to note that the process x(n) can be reconstructed by passing its 
innovation v(m) through Hap(z). 

Although many of the practically arising processes may not be truly AR, AR modeling 
of arbitrary processes for the purpose of spectral estimation has been found to be quite 
effective, provided that a sufficiently large order is considered. Usually, one finds that a 
model order in the range 5 to 10 is more than sufficient to get an acceptable estimate of 
the power spectral density for most of the processes encountered in practice. 

In the context of adaptive filters, the above results have the following implication. The 
correlation matrix of the input process to an adaptive transversal filter may be charac- 
terized by an AR model whose order may be much less than the order of the adaptive 
filter. This, as we shall see in the next section, may effectively be used to improve the 
performance of adaptive filters, at very little computational cost. 
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11.15 Adaptive Algorithms Based on Autoregressive Modeling 


The LMS—Newton algorithm was introduced in Chapter 7, as a method to solve the 
eigenvalue spread problem of adaptive filters whose inputs were colored. In this section, 
the results of the previous sections are used to propose two efficient implementations of 
the LMS—Newton algorithm. We assume that the input sequence to the adaptive filter can 
be modeled as an AR process whose order may be kept much lower than the adaptive 
filter length. The two implementations (referred to as Algorithm I and Algorithm 2) differ 
in their structural complexity. The first algorithm, which will be an exact implementation 
of the LMS—Newton algorithm, if the AR modeling assumption is accurate, is structurally 
complicated and fits best into a DSP-based implementation. On the other hand, the second 
algorithm is structurally simple and is tailored more toward VLSI custom chip design. 
We recall that the LMS—Newton algorithm recursion for an adaptive filter with real- 
valued input is p 
w(n + 1) = w(n) + 2uem) Rt x(n) (11.121) 


where w(n)=[wo(n) wi(n)--- wy]! is the filter tap-weight vector, x(n) = 
[x(n) x(n — 1)---x(n— N +1)]' is the filter input vector, R,, is an estimate of the 
input correlation matrix Rœ = E [x(n)x'(n)], u is the algorithm step-size parameter, 
e(n) = d(n) — y(n) is the measured error at the filter output, d(n) is the desired output 
and y(n) = w!(n)x(n) is the filter output. 

It may be noted that, here, we have added the subscript “xx” to R, to emphasize that 
it corresponds to the input vector x(n). We follow this notation in the rest of this chapter, 
as we need to refer to a number of different correlation matrices. 

To implement the LMS—Newton algorithm, one needs to calculate R>!x(n) for each 
update of recursion (11.121). A trivial way would be to obtain an estimate of Rj! 
first, and then perform the matrix by vector multiplication R5'x(n). This, of course, 
is inefficient and, therefore, an alternative solution has to be found. Here, we pro- 
pose an efficient method for direct updating of the vector R,!x(n), without estimating 
Rg. For this, we note that the vector x(n) may be converted to the vector b(n) = 
[bọ(n) b(n)». emer) ae made up of the backward prediction errors of x(n) for the 
predictors of orders 0 to N — 1. The vectors x(n) and b(n) are related according to Eq. 
(11.68). We also recall that the elements of b(n), that is, the backward prediction errors 
bo(n), bi (n), ..., by_1(”), are uncorrelated with one another. This means that the corre- 
lation matrix R,, = E [b(n)b'(n)] is diagonal, and, therefore, evaluation of its inverse is 
trivial. Furthermore, using Eq. (11.68), we obtain 


Rp = E[Lx(n)(Lx(n))"] = LRL" (11.122) 


Inverting both sides of Eq. (11.122) and pre- and postmultiplying the result by LT and 
L, respectively, we obtain 
R =L'K, L (11.123) 


Next, we define u(n) = R;,'x(n) and substitute for RZ! from Eq. (11.123), to obtain 
u(n) = L'R, Lx) 
= L'R,, b(n) (11.124) 


This result is fundamental to the derivation of the algorithms that follow. 
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In the rest of this section, for the sake of convenience, we shall use the notation u(n) 
even when R,,. is replaced by its estimate, R,.. 


11.15.1 Algorithms 


Algorithm 1: Implementation of Eq. (11.123) requires a mechanism for converting the 
vector of input samples, x(n), to the vector of backward prediction error samples, b(n). A 
lattice predictor may be used for efficient implementation of this mechanism. Moreover, 
if one assumes that the input sequence, x(n), can be modeled as an AR process of order 
M < N, then, a lattice predictor with order M will suffice, and the matrix L and vector 
b(n) take the following forms: 


1 0 vee 0 O--- 0 0 0 
=a); 1 see 0 Üs 0 0 0 
L= | -amm —4y.mM-1 °°: 1 0- 0 0 0 (11.125) 
0 ämm ` ayi l>e 0 0 
0 0 nee 0 0- -ayy -ayumi 1 


and 
b(n) =[b)(1) bi()--- by) by—1)---bya-N+M+ iy (11.126) 


The special structure in rows M + 1 to N of L and elements M + 1 to L of b(n) follows 
from Eq. (11.115). 

In certain applications, such as acoustic echo cancellation, a value of M much smaller 
than N may be used. In such cases, the computational burden of updating b(n) would be 
negligible when compared with the total computational complexity of the whole system, 
as only the first M + 1 samples of b(n) require updating. The rest of the elements of 
b(n) are delayed versions of b,,(n). Multiplication of R; by b(n) (according to Eq. 
(11.124)) also requires only a small amount of computation. It involves estimation of 
the powers of bọ(n) through b,,() and normalization of these samples by their power 
estimates. 

Multiplication of LT by R; b(n), to complete computation of u(n) (according to Eq. 
(11.124)), however, is more involved, since a structure such as lattice is not applicable. 
It requires estimation of the elements of L and direct multiplication of LT by Rz; b(n). 
Considering the forms of L and b(n), one finds that only the first M + 1 and the last M 
elements of LTR; b(n) need to be computed. The remaining elements of LTR; b(n) are 
delayed versions of its (M + 1)th element. 

The following procedure may be used for estimating the elements of L, that is, the 
coefficients of the predictors of orders | to M. An LMS-based adaptive lattice predictor is 
used to obtain the PARCOR coefficients K1, K2,..., Ky Of x(n). The conversion algorithm 
of Table 11.1 is then used to obtain the predictor coefficients of orders 1 to M. Table 11.6 
summarizes this procedure. 
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Table 11.6 An LMS-based procedure for implementation of Algorithm 1. 


Given: Parameter vectors k(n) = [k,(n) (n) Ky)" 
and w(n) = [wo(n) w, (n) wy1M))', 
data vectors x(n), b(n — 1) and u(n — 1), desired output d(n), 


and power estimates Py(n — 1), Pin — 1), ..., Py(n — 1). 
Required: Vector updates k(n), w(n + 1), b(n) and u(n), 
and power estimate updates P(n), P,(n),..., Pyn). 


xxx Lattice Predictor xxx 


f(a) = baln) = x(n) 
Py(n) = BPo(n — 1) + 0.50 — DLE) + bii — 1)] 
form=1to M 
fn) = fmn- > Km N)bn- ia 1) 
bn (n) = Dy =) = Km n) fn-1 (0) 
Km (n H 1) = Km (n) H po [fin-1 Dy (n) + bn- z 1) fa (n)] 
Pa (n) = BP,,(n — 1) +0.50 — DFR) + b3- 1] 
if kp | > Y, Knn) = «,,(n 1) 
end 


xxx Conversion from Lattice to Transversal xxx 


Po(n) = Po(n) 

a) = k(l) _ 

P (n) = (1 — nP) 

for m= 1to M-—-1 
Am, (n) = Am, j (n) — Km+1 Qn m41—j (n), forj = 1, 2,...,m 
anyi, m41) = Km41 0) _ 
Pny) =(1- Kat 0) Pn (n) 

end 


xxx u(n) update xxx 


u;(n)=u;_(n-— 1), for j=M+1, M+2,...,.N-M-—1, 


where u jn) is the jth element of u(n). 


bo(n) b(n) po by(n) byn- 1) a by(n— my 
Pon) PO Py(n) Pua) Pyn) 
[uo(n) u(n) u y(n)]T = Lib, (n), 
where Ly is the (2M + 1)-by-(M + 1) top-left part of L. 
b(n) = Pa (n N+M +Dilbyn— N +2M)byn-—-N+2M-1) 
- byn- N+M +1)" 
luy- n) uy-m+1) oF Uy_(n)]" = LI b, (n), 


where Ly, is the M-by-M bottom-right part of L. 


b, (n) = l 


xxx Filtering and tap-weight vector adaptation xxx 


y(n) = w™(n)x(n) 
e(n) = d(n) — y(n) 
w(n + 1) = w(n) + 2ue(n)u(n) 
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We have the following comments regarding the algorithm presented in Table 11.6. 
The role of the constant € in the PARCOR updating equation is to ensure stability of 
the algorithm when P,,(n) droops to very small values. Also, at every iteration, the 
PARCOR coefficients are constrained to lie within a maximum magnitude y. The predictor 
coefficients, a,, ;’s, and the backward errors obtained through the lattice predictor are 
used to update the first M + 1 and the last M elements of the vector u(n). The iteration 
suggested by Eq. (11.66) is used to obtain the estimates of the backward error powers. 
These are denoted as P,,(n)’s in Table 11.6. The power estimates obtained in the lattice 
predictor part of the algorithm, that is, P„ (n) s, could also be used. However, experiments 
have shown that the use of P,,(1)’s results in a more reliable algorithm. The vectors b, (n) 
and b(n) denote the backward error vectors which correspond to the input samples at 
the head and the end tail parts, respectively, of the tap-delay-line filter. When the input 
signal to the filter is stationary, the elements of b(n) can be obtained by delaying the 
output by (n) of the lattice predictor at the head of x(n). This has been our assumption 
in Table 11.6. When the filter input is nonstationary and the filter length, N, is large, one 
may have to use a separate predictor for the samples at the tail of x(n). 


Algorithm 2: The Algorithm 1, although low in computational complexity, is structurally 
complicated, as the implementation of the Levinson—Durbin algorithm and ordering of 
the manipulated data is not straightforward. This would not be much of a problem if a 
DSP processor is used. Therefore, Algorithm 1 is suitable for software implementation. 
However, if one is interested in a custom chip implementation, he may use Algorithm 2 
proposed below which has a much simpler structure compared to Algorithm 1. 

The reason why Algorithm 1 is not simple is because it needs to update the parameters 
of the lattice and transversal predictors of orders | to M at each time instant. Furthermore, 
only the middle samples in u(n) could be obtained as delayed versions of earlier samples. 
In Algorithm 2, we overcome these problems by extending the input and tap-weight 
vectors, x(n) and w(n), to the vectors 


xen) = [x(n + M)---x(n+ 1) x(n)---x(n-N+41)---x(n-N-M4+D]° 


and 
Wp(n) = [W_y(n)---w_\(n) wn) wy_y(™) ++ Wy yl" 


respectively, and apply an LMS—Newton algorithm similar to Eq. (11.121) for updating 
Ww, (7). Since the tap weights of the original filter correspond to wọ(n) through wy_,(”), 
the first M and last M elements of wẹ (n) may be frozen at 0. This can easily be done by 
initializing these weights to zero and assigning a zero step-size parameter to all of them. 
If this is done, the computation of the first M and last M elements of LTR; Lx (n) (with 
appropriate dimensions for L and R,,) is immaterial and may be ignored. This results in 
the following recursive equation for updating the adaptive filter tap weights: 


wn + 1) = w(n) + 2ue(n)u, (n) (11.127) 
where w(n) is the filter tap-weight vector as defined above, and 


u, (2) = L,R; L; xg (n) (11.128) 
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Here, R,, is a diagonal matrix compatible to the column vector L;x,(7) and the diagonal 
elements of R,, are estimates of the powers of the elements of the latter vector. The 
matrices L, and L, are given by 


Sana ui 1 Ü e 0 ae 0 0 
MO ; amu a “Ai o.. 0 7 ? (11.129) 
0 0 a 0 0 e yey) ge “iy i 
and 
1 —ayı —4y.mM 0 0 o 
_ 0 1 ee -amm 0 0 (11.130) 
IPER? SMa 


The dimension of L, and L, are (N + M)-by-(N + 2M) and N-by-(N + M), respec- 
tively. The number of rows in L, is only N as we do not want to compute the first M 
and last M elements of LTR}; Lxg (n). 

Inspection of Eq. (11.128) reveals that each updating of u, (7) requires only the updating 
of the first element of the vector Rz Lixg(n), and, then, the first element of the final 
result, u, (n). The rest of the elements of the two vectors are delayed versions of their first 
elements. Putting these together, Figure 11.13 depicts a complete structure of Algorithm 2. 
It consists of a backward prediction-error filter H, (z) whose coefficients, ay ;(1)’s, are 
updated using an adaptive algorithm. The time index “n” is added to these coefficients 
to emphasize their variability in time and their adaptation as input statistics may change. 
Any adaptive algorithm may be used for adjustment of these coefficients. The successive 
output samples from the backward prediction-error filter, that is, by (n + M), make the 
elements of the column vector L;xg(n). Multiplication of by(n + M) by the inverse of 
an estimate of its energy, denoted as Pa (n + M) in Figure 11.13, gives an update of 
Rz Li xg(n). Finally, filtering of the latter result by the next filter, whose coefficients 
are duplicates of those of the backward prediction-error filter in reverse order, provides 
the samples of the sequence u (n), that is, the elements of the vector u, (n). It is also 
instructive to note that according to Eq. (11.81), the latter is noting but the forward 
equivalent of the backward prediction-error filter H, (2). 

One may note that the filter output, y(n), is obtained at the time when x(n + M) is 
available at the input of Figure 11.13. This is equivalent to saying that there is a delay 
of M samples at the filter output as compared with the reference input. Although this 
delay could easily be prevented by shifting the delay box, z~™, from the filter input to 
its output, we avoid this here to keep the analysis given in the next section as simple as 
possible. Shifting the delay box to the filter output introduces a delay into the adjustment 
loop of the filter. The result would then be a delayed LMS algorithm which is known to 
be inferior to its nondelayed version (Chapter 6). However, in the cases of interest, when 
M «X N, the difference between the two algorithms is negligible. 

Table 11.7 presents an implementation of Algorithm 2 that follows Figure 11.13, 
closely, except that we assume the input to the backward prediction-error filter to be 
x(n) instead of x(n + M). In this implementation, the backward prediction-error filter 
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copy the coefficients 


Py (n+ M) 


bu(n+ M) 


Figure 11.13 Block schematic diagram depicting Algorithm 2. 


H,,,(Z) is implemented in lattice form. Also note that the power-normalization factor 


Py (n + M) is shifted to the output of the filter z~” Hy ‘Car Experiments have shown 
that this amendment results in a more reliable algorithm. 


11.15.2 Performance Analysis 


An analysis that reveals the differences between Algorithms 1 and 2 is presented. We 
assume that the input process, x(n), is AR of order less than or equal to M. The pre- 
dictors coefficients, {a; j» fori = 1,2,..., M and j =1,2,...,i}, and the corresponding 
mean-squared prediction error for different orders (i.e., the diagonal elements of R,,) 
are assumed to be known. In practice, when M < N, these assumptions are acceptable 
with a good approximation, as in that case the predictors coefficients will converge much 
faster than the adaptive filter tap weights and they will be jittering near their optimum 
setting after an initial transient. With these assumptions, one finds that u(n) is an exact 
estimate of R;,!x(n) and, therefore, Algorithm 1 will be an exact implementation of 
the ideal LMS—Newton algorithm, for which some theoretical results were presented in 
Chapter 7. We consider these results here as a base which determines the best perfor- 
mance that one may expect from Algorithm 1. Moreover, comparison of these results 
with what would be achieved by Algorithm 2, under the same ideal conditions, gives a 
good measure of the performance loss of Algorithm 2 as a result of simplification made in 
its structure. 
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Table 11.7 


Given: 


Required: 


An LMS-based procedure for the implementation of Algorithm 2. 


Parameter vectors k(n) = [k (n) Ky(n)--- kyi] 
and w(n) = [wọ(n) w, (n) -+ wy )]f, 


data vectors x(n), b(n — 1) and u, (n — 1), desired output d (n), 
and power estimates P(n — 1), P)(n—1),..., Py(n — 1). 


Vector updates x(n), w(n + 1), b(n) and u, (n), 
and power estimate updates P(n), P (n), ..., Py(n). 


xxx Lattice Predictor Part xxx 


Jon) = ba (n) = x(n) 


Po(n) = BPo(n — 1) + 0.50 — PFE (n) +45 — DI 


for m = 1 


to M 


fin) = fn 0) a Km (MD, (n _ 1) 


m 
Km (n 


m 


ban) = bpa — 1) = Kn (7) fn- 0) 


H 1) = Km (n) H re [fin—1 Dy, (n) T by- = 1) fn] 


P,,(n) = BP,, (n — 1) +0.50 — B)LF2(n) + b2 — 1)] 


if kp | > Y, Knn) = k,,(n — 1) 


end 


xxx u(n) Update xxx 


u,(n— j) =u,(n—j +1), for j =N = 1; N—-2,...,2 
fon) = bgn) = byn) 


for m = 1 


to M—1 


Tam) = fh) — kn nbp 0- 1) 
Di, (n) = b, a 1) — Kn (n) fa- 0) 


end 


u,(n) = (Py(n) +€) | (fy_ 0) — ky n)by n- 1) 


xxx Filtering and tap-weight vector adaptation xxx 


y(n) = w"(n)x(n — M) 
e(n) = d(n — M) — y(n) 
w(n + 1) = w(n) + 2ue(n)u, (n) 


Under the ideal conditions stated above, the following results of the ideal LMS—Newton 
algorithm (presented in Chapter 7) are applicable to Algorithm 1. 


e The algorithm does not suffer from any eigenvalue spread problem. It has only one 
mode of convergence which is characterized by the time constant 


1 
t=- 
4u 


(11.131) 


e For small values of the step-size parameter, u, its misadjustment is given by the 


equation 


uN 
M, = — 
1 1—yu(N+2) 


(11.132) 
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e To guarantee the stability of the algorithm, its step-size parameter should remain within 
the limits i 
0 cam 11.133 
CA INE ( ) 
The derivation of the above results has been based on a number of assumptions which 
we shall also assume here before proceeding to the analysis of Algorithm 2. A modeling 
problem such as Figure 11.10 is considered and the following assumptions are made: 


1. The input samples, x(n), and the desired output samples, d(n), consist of jointly 
Gaussian-distributed random variables for all n. 

2. At time n, w(n) is independent of the input vector x(n) and the desired output sample 
d(n). 

3. Noise samples e,(7), for all n, are zero-mean and uncorrelated with the input samples, 
x(n). 


The validity of the second assumption is justified for small values of u, as discussed 
in Chapter 6. For the analysis of Algorithm 2, we extend these assumptions by replacing 
x(n) with x(n), so that it extends to include the independence of u, (n) with w(n). 

Now, we proceed with an analysis of Algorithm 2. First, we present an analysis of 
the convergence of w(n) in the mean which gives a result similar to Eq. (11.131). Next 
we analyze the convergence of w(n) in the variance to obtain the misadjustment of the 
algorithm. This analysis also reveals the effect of replacing u(n) by u, (n). 

Convergence of Tap-Weight Vector in the Mean: We look at the convergence of 
E[w(n)] as n increases. To this end, we note that 


e(n) = d(n) — w'(n)x(n) = e,(n) — v'(n)x(n) (11.134) 


where v(n) = w(n) — w, is the weight-error vector and from Figure 11.10, we have noted 
that d(n) = wix(n) + e (n). Substituting Eq. (11.133) in Eq. (11.127), we get 


vn+1)=d- 2yu,(n)x! (n))v(n) + 2ue,(n)u, (n) (11.135) 


where I denotes the identity matrix with appropriate dimension. Taking expectation and 
using the Assumptions 2 and 3 listed above, we obtain 


E[v(n + 1)] = A — 2vE[u,(@)x' (n)) Elv(n)] (11.136) 
To evaluate E[u, (n)x(n)], we first define 
ug(n) = Ry. XE (n) (11.137) 


where R,- = E[xp(n)x}(n)], and note that postmultiplying Eq. (11.137) by x(n) and 


XEXE 
taking expectation on both sides gives 


Efuz(n)xp(n)] = I (11.138) 


This shows that the cross-correlation between the elements of ug (n) and x(n) that are at 
the same position are unity and equal to zero for the other elements of the two vectors. 
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Clearly, this is also applicable to the elements of u,(m) and x(n), as they are truncated 
versions of ug(n) and x(n), respectively. Thus, 


E[u,(n)x'(n)] =I (11.139) 


and therefore, 
Elva+1)]=( — 2w)E[v(n)]. (11.140) 


This shows that, similar to Algorithm 1, Algorithm 2 is also governed by a single mode 
of convergence. Furthermore, the time constant Eq. (11.131) is also applicable to Algo- 
rithm 2. 

Convergence of Tap- Weight Vector in the Mean Square: We first develop a recursive 
equation for the time evolution of the correlation matrix of the weight-error vector v(n), 
which is defined as K(n) = E[v(n)v'(n)]. For this, we find the outer products of the left- 
and right-hand sides of Eq. (11.135) and take expectation on both sides of the resulting 
equation. Then, using Assumptions 2 and 3 listed above, we obtain 


K(n + 1) = E[( — 2uu,(n)x! nK mA — 2ux(n)ul (n))] 
+ 4U Emin Ruana 
= K(n) — 2wE[u,(n)x' (n)|K(n) — 2uK (n) E[x(n)u! (n)] 
+ 4p? E[u,(n)x" (n)K(n)x(n)uy (2)] + 4U Emin Ruu 
= (1 — 4WK(n) + 4u? E[u, (nx (n)K(n)x(n)ul (n)] 
HAW Emin Raua (11.141) 


where £nin = Ele2(n)] is the minimum mean-squared error at the adaptive filter output 
and R,,,,,, = Efu,(n)u} (n)]. 

The second term on the right-hand side of Eq. (11.141) can be evaluated by following 
a procedure similar to the one given in Chapter 6, Appendix 6A, for the case of the 
conventional LMS algorithm. This results in (Appendix 11A for the derivation) 


E[u, MXT (n)K(n)x(n)ul (n)] = R, n te IK@)R,, | + 2K(n) (11.142) 


Ugla 
Using this result in Eq. (11.141), we obtain 

K(n + 1) = (1 — 4u + 807) K(n) + 4W7R,,,,, EKR] 
+ 4M Emin Ruana (11.143) 


Next, we recall from Chapter 6 that the excess mean-squared error of an adaptive filter 
with input and weight-error correlation matrices R,,. and K(n), respectively, is given by 


Ea (n) = tr[K(7)R,, | (11.144) 


Postmultiplying Eq. (11.143) on both sides by R,, and equating the traces of the two 
sides of the resulting equation, we obtain 


Ez (n + 1) = el z 4u + Au (tR Ro] + 2) )E ex (n) 
+ 4M E mintr [Ryu Re] (11.145) 
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From Eq. (11.145), we note that the convergence of Algorithm 2 is guaranteed if 
|1 — 4u + 4y7(trIR,,4,Rex1 + 2)| <1 (11.146) 


This gives 
1 
0 = 11.147 
S/S R,,.Rul +2 i 


Also, when n — œ, &(n + 1) = &(n). Using this in Eq. (11.145), we obtain 


M, = Sex (co) = Mtr Ryu, Ry] 
f Emin 1— M(t Ryu, Ral + 2) 


This is the misadjustment equation for Algorithm 2. The above results reduce to those of 
Algorithm 1 if R, „, is replaced by R,,, = E [u(n)u™(n)] and one notes that R,,, = R3. 

In view of Eqs. (11.132) and (11.148), a good measure for comparing Algorithms 1 
and 2 is the ratio 


(11.148) 


tR ou Ra ] 

y= N (11.149) 
A value of y > 1 indicates that Algorithm | performs better than Algorithm 2. Further- 
more, the larger the value of y, the greater would be the loss in replacing Algorithm 1 

by Algorithm 2. However, if y ~ 1, then the two algorithms perform about the same. 
An evaluation of the parameter y is provided in Appendix 11B. It is shown that y is 
always greater than unity. This means that there is always a penalty to be paid for the 
simplification made in replacing the vector u(n) of Algorithm 1, by the vector u,(7) of 
Algorithm 2. The amount of loss depends on the statistics of the input process, x(n), and 
the filter length N. Fortunately, the evaluation provided in Appendix 11B shows that y 
approaches one as N increases. This means that the difference between the two algorithms 
may be insignificant for long filters. Numerical examples which verify this are given next. 


11.15.3 Simulation Results and Discussion 


We present some simulation results using the input process x, (7), which was introduced 
in Section 11.14, and also two other processes, x(n) and x3(n), which are generated 
using the coloring filters 


Ky 


H. = 
2(2) 1 — 0.650z—! + 0.693z7? — 0.220z~3 + 0.309z-4 — 0.177z-5 


(11.150) 


and 
K3 
1 — 2.05977! + 2.312z-2 — 1.893z-3 + 1.148774 — 0.293z-5 


respectively. The coefficients K, and K, are selected equal to 0.7208 and 0.2668, respec- 
tively, to normalize the resulting processes to unit power. We note that x,(m) and x3(n) 
are AR processes, but x; (n) is not. 

To verify the theoretical results presented above, we start with some experiments using 
the AR processes x(n) and x(n). Figure 11.14 shows the power spectral densities of 


H(z) = (11.151) 
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Figure 11.14 Power spectral densities of x,(n) (AR2) and x3(n) (AR3). 


x(n) and x3(n). From Chapter 4, we recall that the eigenvalue spread of the correlation 
matrix of a process is asymptotically determined by the maximum and minimum of its 
power spectral density. Noting this, we find that the eigenvalue spread of x,(m) is in the 
range of 100 and that of x3(7) can be as large as 10,000. This shows that x3(7) is a very 
badly conditioned process and one should expect difficulties in estimating the inverse of 
its correlation matrix. 

To shed light on the differences between Algorithm 1 and Algorithm 2, we first present 
some simulation results for the case when the exact models of the AR inputs are known 
a priori. In this case, Algorithm 1 will be an exact implementation of the LMS—Newton 
algorithm and gives a good base for further comparisons. Figure 11.15 shows the variation 
of the parameter y as a function of the filter length, N, for x (n) and x3(n). As one may 
expect, the process x3(n), which is suffering from a serious eigenvalue spread problem, 
shows higher sensitivity toward replacing Algorithm | by Algorithm 2. However, as N 
increases, y approaches | and, therefore, the two algorithms are expected to perform 
about the same. 

Figures 11.16 and 11.17 show the simulation results for the inputs x,(7) and x3(n) and 
a filter length N = 30. These results as well as those presented in the rest of this section 
are averaged over 50 independent runs. The results are then smoothed so that the various 
curves could be distinguished. The step-size parameter, jz, is selected equal to 0.1/N, for 
all the results. This, according to Eq. (11.132), results in about 10% misadjustment for 
Algorithm 1. According to the results of Figure 11.15 and Eqs. (11.132) and (11.148), 
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Figure 11.15 Variation of the parameter y as a function of filter length for x, (n) (AR2) and x; (n) 
(AR3). 
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Figure 11.16 MSE versus iteration number for x(n), N = 30 and with the AR model of input 
assumed known. 
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Figure 11.17 MSE versus iteration number for x;(n), N = 30 and with the AR model of input 
assumed known. 


both algorithms should approach to about the same misadjustment, in the case of x,(). 
However, their performance may be significantly different, in the case of x3(n). To be 
more exact, from the data used for generating Figure 11.15, we have y = 1.13, for x(n), 
and y = 5.57, for x3(n), for N = 30. Using these and Eqs. (11.132) and (11.148), we 
obtain the following: 

M 


to 


For x(n): = 1.147. 
For x3(n): M = 11.32. 


se 


Careful examination of the numerical values which have been obtained by simulations 


show that for the x(n) process — = 1.152. This matches well with the above ratio. 


However, for the x3(n) process the simulation results give na = 3.85. This which does 
not match the above theoretical ratio, may be explained as follows. Careful examination 
of the numerical results in simulations reveals that there are only a few terms in u, (n) that 
have a major effect on the degradation of Algorithm 2, when compared with Algorithm 1. 
These terms, which greatly disturb the first and last few elements of the tap-weight vector 
w(n), are so large that their contribution violates the independence assumption 2 of the 
previous section. As a result, the theoretical derivation that led to Eq. (11.148) may 
not be valid unless the step-size parameter jz is set to a very small value so that the 
latter assumption could be justified. Nevertheless, the developed theory is able to predict 
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Figure 11.18 MSE versus iteration number for x;(”), N = 200 and with the AR model of input 
assumed known. 


conditions under which Algorithm 2 is more likely to go unstable; namely, when the 
adaptive filter input is highly colored. 

In order to support, the prediction made by the theory that the two algorithms per- 
form about the same for long filters, we present another simulation example with the 
process x3(n) as the filter input. This time we increase the length of the filter, N, to 200. 
Figure 11.18 shows the results of this test. For this scenario, the theory gives — = 1.69 
and simulation gives a = 1.64, a good match, as was predicted. 

Next, the simulation results of more realistic cases when the input process is unknown 
and its model has to be estimated along with the adaptive filter tap weights, are presented. 
We present some results for Algorithm 2. The simulation program that we use follows 
Table 11.7. The following parameters are used: 8 = 0.95, y = 0.9, € = 0.02, Myo = 0.01, 
u =0.1/N, and N = 30. Figure 11.19a, b, and c shows the simulation results for the 
processes x; (n), x2 (n), and x3 (n), respectively. The results are given for the conventional 
LMS algorithm and Algorithm 2, for the cases where the order of AR model, M, is 
set equal to 1, 2, 3, and 5. The results clearly show the improvement achieved by the 
AR modeling. We note that for x(n) and x3(n), even a first-order modeling of the input 
processes results in significant improvement in convergence, compared to the conventional 
LMS algorithm. However, for x(n) a modeling order of 2 or above is required to achieve 
some improvement. 
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Figure 11.19 Comparison of the conventional LMS and Algorithm 2, for different inputs and 
various orders of AR model: (a) input process x, (n), (b) input process x(n), and (c) input process 
X3(n). 
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Problems 
P11.1 Give a detailed proof of Eq. (11.52) and find that such proof leads to Eq. (11.53). 
P11.2 Define the m-by-m matrix J whose ijth element is | for j =m — i + 1 and 0 
for all other i, j € {1,2,..., m}. This is called an exchange matrix. Show that 
if R is the correlation matrix of a stationary stochastic process, then JRJ = R. 
Use this result to derive an alternate proof for Eq. (11.19) or Eq. (11.20). 
P11.3 Using the procedure mentioned in Problem P4.8, show that for the matrix L as 
defined in Eq. (11.70) 
det(L) = 1 
Comment on the invertibility of L. 
P11.4 Give a detailed derivation of Eq. (11.81). 
P11.5 Consider the order-update equations (11.60) and (11.61). Show that the optimiza- 
tion of K„+1 in either of the two equations for minimization of the corresponding 
higher order errors in the mean-square sense gives Eq. (11.59). 
P11.6 Equation (11.90) may be rearranged as 


m 


rim +1) = Paki) + X dn, gr (m +1-i) 


i=l 


Use this result to develop procedures for the following: 
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P11.7 


P11.8 


P11.9 


P11.10 


P11.11 


P11.12 


P11.13 


P11.14 


P11.15 


1. Conversion of the set of coefficients (P), K1, K2, .--, Ky) to (r (0), r(1),..., 
r(M)). 

2. Conversion of the set of coefficients (Po, am 1; 4m2» ---» am, m) to (r(0), 
r(1),...,r(M)). 


Derive the recursion used for obtaining the transversal predictor coefficients w; 
in Table 11.4. 


Give the lattice equivalent of the forward prediction-error filter which is char- 
acterized by the system function 


Hp) = 1 — 0.57? + 0.577? + 0.2524 


Give the transversal equivalent of the third-order forward and backward 
prediction-error filters of a process which is characterized by the PARCOR 
coefficients 


k = 0.8 


Find the lattice realization of the system function 
H(z) = ————_ 

@) 1—az"! 

Find the lattice realization of the system function 

1 
1 — 1.5z7! 0.5607 
Find the lattice realization of the system function 
1+7! +277? 
1 — 1.2z-! + 0.527? 


Use the Levinson—Durbin algorithm to find the fourth-order transversal and 
lattice predictors of a process x(n), which is characterized by the correlation 
coefficients 


H(z)= 


H(z) = 


r0 =5, r0 =3, rQ=-l, r@=2, r4 =-0.5 


Use the Levinson—Durbin algorithm to solve the system of equations 


1.0 0.8 —0.5 0.2] | wo 0.8 
0.8 10 0.8 —0.5j|]w |_ | —0.5 
—0.5 08 10 08||w:| 0.2 

0.2 —0.5 0.8 1.0] | w 0 

Use the extended Levinson—Durbin algorithm to solve the system of equations 
10 08 —0.5 0.2] | wo 0.5 
0.8 10 08 —O5}]}w,} | 1 
—0.5 08 10 O8j;; wo} | 0.5 


0.2 —0.5 08 1.0] | w 0 
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P11.16 


P11.17 


P11.18 


P11.19 


P11.20 


Adaptive Filters 


Consider the case where the input x(n) to a lattice predictor is an AR process 
obtained by passing a unit variance white noise through a system with the transfer 
function i 


— 1.1z7! + 0.3z7? 


H(z) = i 


(i) Find the correlation coefficients r(0), r(1), r(2),... of x(n). 
(ii) Using the results of (i), find the PARCOR coefficients k; of the lattice 
predictor and show that «x; = 0, for i > 2. 
(iii) Could you predict the fact that «x; = 0, for i > 2, without performing the 
calculations in (ii)? Explain. 
(iv) The result in (ii) implies that the backward prediction error b, (n) is a white 
process. Prove this. What is the variance of b, (n)? 


Consider the case where the input x(n) to a lattice predictor is a moving average 
(MA) process obtained by passing a unit variance white noise through a system 
with the transfer function 


H(z) = 1 — 1.177! + 0.377? 


(i) Find the correlation coefficients r (0), r(1), r(2),... of x(n). 
(ii) Using the results of (i), find the PARCOR coefficients K; of the lattice 
predictor, for i = 1 through 6. 
(iii) You should have found in (ii) that |x;| decreases as i increases. Using this 
observation, can you argue that as i increase, b;(n) approaches a white 
noise with unit variance? Explain. 


Repeat problem P11.17 for the case where 
1 — 1.127 + 0.32? 
1 —0.3z7! 


Consider an AR process which is described by the difference equation 


A(z) = 


x(n) = 0.6x(n — 1) + 0.67x(n — 2) + 0.36x(n — 3) + v(n) 
where v(n) is a zero-mean white noise process with variance of unity. 


(i) Find the system function H(z) which relates v(m) and x(n). 
(ii) Show that the poles of H(z) are 0.9, —0.8, and 0.6. 
(iii) Find the power of x(n). 
(iv) Find the PARCOR coefficients «,, K2, and K3 of x(n). 

(v) Find the prediction-error powers P}, Pa, and P} of x(n). 
(vi) Comment on the values of x,, and P,,, for values of m > 4. 


(vii) Using the procedure developed in the Problem P11.6, find the autocorre- 
lation coefficients r(0),r(1),...,7(5) of x(n). 


For a real-valued process x(n) with mth-order forward and backward prediction 


errors f,,(n) and b,,(n), respectively, prove the following results. 
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P11.21 


P11.22 


P11.23 


P11.24 


P11.25 


(i) ELfs(n) fhn- 2)] = 0 
(ii) E[b5(n)by(n — 2)] = 0 
(iii) Elfa x] = EL fn @)] 
(iv) El, (n)x(n — m)] = Eb} (n)] 
(v) For 0 < k < m, E| fa n) fn- (n — k) =0 
(vi) For 0 < k < m, E[b,,(n)b,,_,(n — k)] = E[b2.(n)] 


m 


For a real-valued process x(n) with mth-order forward and backward prediction 
errors f„(n) and b,,(n), respectively, find the range of i for which the following 
results hold. 


(i) El fin@) fm- —i)] = 0 
(ii) E[b,,(n)b,,_,(n — i)] =0 
(iii) ELf,(™b, y(n —i)] =0 
(iv) EDn (2) fy_x(n — i] = 0 


Consider a complex-valued process x(n). 


m 


(i) Using the principle of orthogonality, derive the Wiener—Hopf equation that 
gives the coefficients a, ; of order m forward linear predictor of x(n). 
Gi) Repeat (i) to derive the coefficients g, ; of order m backward linear pre- 
dictor of x(n). 
(iii) Show that 
§m.i = Onm for i = 1,2,...,m 


In the case of complex-valued signals, the order-update equations (11.60) and 
(11.61) take the following forms: 


Sing 1 @) = Sn) ~~ Km+1bm (1 E 1) 


and 


Ping (n) = bm(n 7 1) E eit tn) 


where the asterisk denotes complex conjugation. Give a detailed proof of these 
equations and show that 


_ Elf, bj, (n = 1)] 


Km+1 >= 
P 
m 


where P,, = E[| f |? = Ello, (n — 1)/7). 


For the case of complex-valued signals, propose a lattice joint process estimator 
similar to Figure 11.7 and develop an LMS algorithm for its adaptation. 


For a complex-valued process x(n) prove the following properties: 


G) Elfa (n)x*(n— k) =0, fr l1<k<m 
(ii) Efb„(n)x*(n— k)]=0, for0O<k<m-1 
(iii) E[D,(n)bF(n)] =0, for k#l 
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P11.26 Consider the difference equation (11.114) and note that it may be written as 
x(n) = x} (n — 1)h + v(n) (P11.26.1) 


where x(n) = [x(n — 1) x(n — 2) - -- x(n — M)]? and h = [h, h--- hyl. 


(i) Starting with Eq. (P11.26.1) show that the vector h is related to the autocor- 


relation coefficients r (0), r(1),..., r(M) of x(n) according to the equation 
Ryh=ry 
where 
r(0) r(1) --» r(M—1) 
r(1) r(0) -»» r(M — 2) 
r(M —1) r(M—2) --- r(0) 
and 


ry = (1) r(2)---r(M)y" 
(ii) Show that for any m 


M 
r(m) =} hyr(m = i) 
i=] 


(iii) By combining the results of (i) and (ii) show that for any M’ > M, 


Ry’ =fy 


where h’ = H and 0, here, is the length M’ — M zero column vector. 
(iv) Use the above results to justify the validity of the results presented in Eqs. 
(11.115) and (11.116). 


P11.27 Suggest a lattice structure for realization of the transfer function W (z) of the IIR 
line enhancer of Section 10.3 and obtain its coefficients in terms of the param- 
eters s and w. 


P11.28 Give a detailed derivation of Eq. (11.123). 
P11.29 Give a derivation of Eq. (11.147) from Eq. (11.146). 


P11.30 Recall that the unconstrained PFBLMS algorithm of Chapter 8 converges very 
slowly when the partition length, M, and the block length, L, are equal. In 
Section 8.4, it was noted the slow convergence of PFBLMS algorithm can be 
improved by choosing M a few times larger than L. We also noticed that the 
frequency domain processing involved in the implementation of the PFBLMS 
algorithm may be viewed as a parallel bank of a number of transversal filters, 
each belonging to one of the frequency bins. Furthermore, when the number 
of frequency bins is large, these filters operate (converge) almost independent of 
one another. Noting these, the following alternative solution may be proposed to 
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improve the convergence behavior of PFBLMS algorithm. We may keep L = M 
and use a lattice structure for decorrelating the samples of the input signal at 
each frequency bin. Explore this solution. In particular, note that when L = M, 
the autocorrelation coefficients of the signal samples at various frequency bins 
are known a priori. Explain how these known information can be exploited in 
the proposed implementation. 


Computer-Oriented Problems 


P11.31 Write a simulation program to confirm the results presented in Figure 11.12. If 
you are looking for a short-cut and you have access to the MATLAB software 
package, you may study and use the program 1tc_mdlg.m on the accompany- 
ing website. Also, run 1tc_mdlg.m or your program for the following values 
of the step-size parameters M, o and He and observe the impact of those on 
the performance of the algorithm. Comment on your observations. 


Hp,o Keo 


0.01 0.003 
0.001 0.010 
0.001 0.001 


0.0001 0.003 


P11.32 Consider a process x(n) which is characterized by the difference equation 
x(n) = 1.2x(n — 1) — 0.8x(n — 2) + v(n) +av(n — 1) 
where œ is a parameter and v(m) is a zero-mean unit variance white process. 


(i) Derive an equation for the power spectral density ®,. (e/”), of x(n), and 
plot that for values of a = 0, 0.5, 0.8, and 0.95. 

(ii) The autocorrelation coefficients r(0),r(1),... of x(n) can be numerically 
obtained by evaluating the inverse discrete Fourier transform (DFT) of sam- 
ples of ®,,(e/”) taken at equally spaced intervals. The number of samples 
of ®,,(e/°) used for this purpose should be large enough to give an accu- 
rate result. Use this method to obtain the autocorrelation coefficients of 
x(n). 

(iii) Use the results of (ii) to obtain the system functions of the backward 
prediction-error filters of x(n) for predictor orders of 2, 5, 10, and 20, 
and values of a = 0, 0.5, 0.8, and 0.95. 

(iv) Plot the power spectral density ®, (e/”) of the order m backward pre- 
diction error b,,(n) for values of m = 2, 5, 10, and 20 and comment on 
your observations. In particular, you should find that for all values of a, 
b,,(n) becomes closer to white as m increases. However, for values of a 
closer to unity, a larger m is required to obtain a white b,, (n). Explain this 
observation. 
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P11.33 


(i) 


(ii) 
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On the basis of algorithms provided in Tables 11.6 and 11.7, develop 
and run simulation programs to confirm the results presented in 
Figures 11.16-11.19. If you are looking for a short-cut and you have 
access to the MATLAB software package, you may start with the programs 
ar_m_all.m and ar_m_al2.m on the accompanying website. Note 
that these programs give only the core of your implementations. You 
need to study and amend them accordingly to get the results that you are 
looking for. 

Use the process x(n) of P11.32 as the adaptive filter input and try that for 
values of a = 0, 0.5, 0.8, and 0.95, and various values of the AR modeling 
order, that is, the parameter M. Comment on your observations. To get 
further insight, you may need to evaluate the parameter y of Eq. (11.149) 
for each case. Write a program to generate y. For this you need to calculate 
the autocorrelation functions of x(n) and u,(n) and use them to build the 
matrices R,, and R,,,,,, required in Eq. (11.149). These can conveniently 
be obtained by calculating the inverse DFT of the samples of the power 
spectral densities of the corresponding processes as discussed in Part (ii) 
of Problem P11.32. 
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Appendix 11A: Evaluation of E[u,()x! (n)K(n)x(n)u (n)] 
First, we note that 


N=-1N-1 


5 x(n —i)x(n — j)k;(n) (11A.1) 


x! (n)K(n)x(n) = 
i=0 j= 


is a scalar, with k(n) denoting the ijth element of K(n). Also, C(n) ê 


u, (n) x! (n)K(n)x(n)ul (n) is an N-by-N matrix whose /mth element is 


N-1N-1 


Cm (Nn) = u(n — Duan — m) X $ x(n — Dx(n — j)ky (n) (11A.2) 


i=0 j=0 


Taking statistical expectation of cı„(n), we obtain 


N-1N-1 


Elemal= >> J Eu — Du,(n — m)x(n — Dx — jkn) (11A.3) 


i=0 j=0 


Now, recall the assumption that the input samples x(n) are a set of jointly Gaussian 
random variables. Since for any set of real-valued jointly Gaussian random variables x4, 
X2, X3, and X4 


E [x1 X9X3X4] = E[x, xz] E[x3x4] + Elx x3] E[x x4] 
+ E[x, x4] E [x23] (11A.4) 


we obtain 


Elu,(n — l)u, (n — m)x(n — i)x(n — j) = 
rim rÜ + 6(1—i)5(m — j) + 4(l — j)d(m — i) (11A.5) 


Ualla 


im, and r are the /mth and ijth elements 


respectively. In deriving this result, we have 


where 6(-) is the Kronecker delta function, and r 
of the correlation matrices R, u, and R,,, 
noted that 
, 1 J/=i 
E[u, (n —Dx(n — i)] = 0 141 (114.6) 


Substituting Eq. (11A.5) in Eq. (11A.3) and noting that k,,,(n) = k(n), we obtain 


N-1N-1 
Elem M] = re, YD re ky (2) + 2k (n) 


i=0 j=0 


=r! [KMR] + 2k, (n) (11A.7) 


Uata 


for l = 0, 1,..., N — 1 and m = 0, 1,..., N — 1. 
Combining these elements to construct the matrix E[C(n)] = E [u,,(n)x! (n)K(n)x(n) x 
u!(n)], we get Eq. (11.142). 
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Appendix 11B: Evaluation of the parameter y 
To evaluate y, we proceed as follow: 
Ra] = t[E [u (nu (2) Ry] 
= Eftr mu mR]. (11B.1) 


tr[R 


Ualta 


We note that for any pair of matrices A and B with dimensions N,-by-N, and N,-by-N}, 
respectively, tr[AB] = tr[BA]. Using this in Eq. (11B.1), we may write 


t[R, „ Ry] = Eu! ()R,,u,(n)] (11B.2) 


Ualla 


Note that the trace function has been dropped from the right-hand side of Eq. (11B.2), as 
ul (n)R, „u, (n) is a scalar. 
Next, we recall that the correlation matrix R,, may be decomposed as (see Eq. (4.23)) 


Ra = ` qaq] (11B.3) 


where A,’s and q;’s are the eigenvalues and eigenvectors, respectively, of R}. Using Eq. 
(11B.3) in Eq. (11B.2), we obtain 


N-1 


tR nau Rea] = > Ki Ni (d 1B.4) 
i=0 


where n; = E[(q/u,(n))7] 

Now, we shall analyze the terms A,;n;. For this, we refer to Eq. 11B.1 which depicts a 
procedure for measuring 4;n; through a sequence of filtering and averaging procedures. 
The AR process x(n) is generated by passing its innovation, v(m), through its model 


transfer function 1 
H,g (e?) = A (11B.5) 


M —jmo 
= nt ay if E 


The innovation v(n) is a white noise process with variance oĉ. Passing x(n) through the 
eigenfilter O;(e/ ©) (the FIR filter whose coefficients are the elements of the eigenvector 
q;) generates a signal whose mean-squared value is equal to 4;. On the other hand, 
according to Figure 11.13, the sequence u,(n) is generated from by(n + M) by first 
multiplying that by a (the inverse of the variance of by(n + M)) and then passing the 
result through an FIR filter with the transfer function z~™ H, i (z~!) which is nothing but 
1/Hap(e/”). Passing u,(n) through the eigenfilter Q;(e/”) generates the samples of the 
sequence q; u,(”) whose mean-squared value is then measured. Accordingly, following 
Figure 11B.1, one will find that 


1 27 . . 
A= / o? | Hag (e!”)I7|0;(e/”)2do (11B.6) 
20 0 
and i Sx i ~ a 
ees —_8 __|Q).(e/”)|-d 11B.7 
n; F oT (eye ait J do (11B.7) 
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Figure 11B.1 Procedure for the evaluation of 4;7;. 


We also note that the innovation process v(m) and the backward prediction error, b y(n) 
(or equivalently by(n + M)), are statistically the same. This implies, ae =o. Noting 
this, Eqs. (11B.6) and (11B.7) give 


— 1 i a jw, \2 jo, \2 = l jw j2 
Àini = (=) (J |Ha (e7?) 10; (e7) ao) (J ACOA | dw) 
(11B.8) 


Equation (11B.8) is in an appropriate form that may be used to give some argument with 
regard to the value of 4;n; and the overall summation in Eq. (11B.4). 

Now, if f(x) and g(x) are two arbitrary functions with finite energy in the interval 
(a, b), then the Cauchy—Schwartz inequality states that 


2 b b 
< ( f Lf) ( f eoar) (11B.9) 


with the equality valid when f(x) = ag(x), a being a scalar. Using this, Eq. (11B.8) 
gives 


b 
| f Œœ)gŒ)dx 


1 27 ; 2 
ring 2 (=f |2,(e"") do) (11B.10) 
27 0 
Noting that Q;(e/”) is a normalized eigenfilter in the sense that aq} 4; = |, the right-hand 
side of Eq. (11B.10) is always equal to unity (Chapter 4). Using this result in Eq. (11B.4) 
and recalling the definition of the parameter y, we obtain 


yzl (11B.11) 


A particular case of interest for which the inequality (11B.10) (and thus Eq. (11B.11)) 
will be converted to equality is when |Q;(e/”)|? is an impulse function in the form 
275(@ — w,). In fact, this happens to be nearly the case as the filter length, N, increases 
to a large value. With this argument one can say that the above inequalities will all be 
close to equalities when the filter length, N, is large. 


12 


Method of Least-Squares 


The problem of filter design for estimating a desired signal based on another signal can 
be formulated from either a statistical point of view or deterministic point of view, as was 
mentioned in Chapter 1. The Wiener filter and its adaptive version (LMS algorithm and 
its derivatives) belong to the statistical framework as their design is based on minimizing 
a Statistical quantity, the mean-squared error (MSE). So far, all our discussions have been 
limited to this statistical class of algorithms. In the next two chapters, we are going to 
consider the second class of algorithms that are derived based on the method of least- 
squares, which belongs to the deterministic framework. We have noted that the class of 
LMS-based algorithms is very wide and covers a large variety of algorithms, each having 
some merits over the others. The class of least-squares-based algorithms is also equally 
wide. Current literature contains a large number of scientific papers that report a diverse 
range of least-squares-based adaptive filtering algorithms. 

We recall that, in the derivation of the LMS algorithm, the goal was to minimize the 
mean-squared of the estimation error. In the method of least-squares, on the other hand, 
at any time instant n > 0, the adaptive filter parameters (tap weights) are calculated so 
that the quantity 


bin) = D> py OGA) (12.1) 


k=1 


is minimized, and hence the name least-squares. In Eq. (12.1), k = 1 is the time at which 
the algorithm starts, e,(k), for k = 1, 2,...,n, are the samples of error estimates that 
would be obtained if the filter is run from time k = 1 to n using the set of filter parameters 
that are computed at time “n,” and p,(k) is a weighting function whose role will be 
discussed later. Thus, in the method of least-squares, the filter parameters are optimized 
using all the observations from the time the filter begins until the present time and 
minimizing the sum of squared values of the error samples of the filter output. Clearly, 
this is a deterministic optimization of the filter parameters based on the observed data. 
An insightful interpretation of the method of least-squares is its curve fitting property. 
Consider a curve whose samples are the desired output samples of the adaptive filter. In the 
same manner, samples of the filter output (given some input sequence) can be considered 
to constitute another curve. Then, the problem of choosing the filter parameters to find the 
best fit between these two curves boils down to the method of least-squares, if we define 
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the best fit as one which minimizes a weighted sum of squared values of the differences 
between the samples of the two curves. 

In this book, our discussion on the method of least-squares is rather limited. In this 
chapter, we first present a formulation of the problem of least-squares for a linear combiner 
and discuss some of its properties. We also introduce the standard recursive least-squares 
(RLS) algorithm as an example of the class of least-squares-based adaptive filtering algo- 
rithms. Some results that compare the LMS and RLS algorithms are also given in this 
chapter. In the next chapter, we present the development of fast RLS algorithms that are 
computationally more efficient than the standard RLS algorithm, for recursive implemen- 
tation of the method of least-squares. 


12.1 Formulation of Least-Squares Estimation 
for a Linear Combiner 


Consider a linear adaptive filter with the observed real-valued input vector x(n) = 
xon) x(n) ++: x y_,(n)]", tap-weight vector w(n) = [wo(n) w,(n)--- wy_,(n)]", and 
desired output d(n). The filter output is obtained as the inner product of w(n) and x(n), 
that is, w'(n)x(n). Note that, here, we have not specified any particular structure for 
the elements of the input vector x(n). The elements of x(n) may be successive samples 
of a particular input process, as it happens in the case of transversal filters, or may be 
samples of a parallel set of input sources, as in the case of antenna arrays. 

In the method of least-squares, at time instant “n,” we choose w(n) so that the sum- 
mation (12.1) is minimized. We define 


y, (k) = w'(n)x(k), for k=1,2,...,n (12.2) 


as the filter output generated using the tap-weight vector w(n). The corresponding esti- 
mation error would then be 
€,(k) = d(k) — y, (k) (12.3) 


Thus, we note from Eqs. (12.1 ) and (12.2) that the addition of the subscript “n” to 
the samples of filter output, y,(k), and the error estimates, e,(k), is to emphasize that 
these quantities are computed using the solution, w(7), at instant n that is obtained by 
minimizing the weighted sum of error squares over all the instants up to “n.” 

To keep our derivations simple, we assume that the weighting function p, (k) is equal to 
1, for all values of k in the first three sections of this chapter. We also adopt a matrix/vector 
formulation of the problem. 

We define the following vectors: 


d(n) = [d (1) d (2) ---dm)|" (12.4) 

y(n) = [y_(1) yp (2) +++ Ya)" (12.5) 
and 

e(n) = [e, (Le, (2) «++ e, (n)]" (12.6) 


We also define the matrix of observed input samples as 


X(n) = [x(1)x(2) --- x(n)] (12.7) 
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Then, using Eqs. (12.2) and (12.3) in Eqs. (12.4)—(12.7), we get 


y(n) = X"(n)w(n) (12.8) 
and 
e(n) = d(n) — y(n) (12.9) 
Furthermore, with p,(k) = 1, for all k, Eq. (12.1) can be written as 
t(n) =e! (men) (12.10) 
Substituting Eqs. (12.8) and (12.9) in Eq. (12.10), we obtain 
tn) = d'(n)d(n) — 207 (n)w(n) + w! (n) W(n)w(n) (12.11) 
where 
Y(n) = X(n)X'(n) (12.12) 
and 
0(n) = X(n)d(n) (12.13) 


Setting the gradient of ¢(m) with respect to the tap-weight vector w(n) equal to zero 
and following the same line of derivations as in the case of Wiener filters (Chapter 3), 
we obtain 

W(n)Ww(n) = 0 (n) (12.14) 


where w(n) is the estimate of filter tap-weight vector in the least-squares sense. 
Equation (12.14) is known as the normal equation for a linear least-squares filter. It 
results in the following least-squares solution 


W(n) = W! (n0 (n) (12.15) 


Substituting Eq. (12.15) in Eq. (12.11), the minimum value of ¢(n) is obtained as 
Emin 0) = d"(n)d(n) — 8" (n)W~! (n0 (n) 
= d'(n)d(n) — 67 (n)W(n) (12.16) 


12.2 Principle of Orthogonality 


We recall that in the case of Wiener filters, the optimized output error, e,(7), is orthogonal 
to the filter tap inputs, in the sense that the following identities hold: 


Ele,(n)x,(n)]=0, for i=0,1,...,N—1 (12.17) 


where x; (n) is the ith element of the tap-input vector x(n), and E[-] denotes the statistical 
expectation. This was called the principle of orthogonality for Wiener filters. Similar result 
can also be derived in the case of linear least-squares estimation by following the same 
line of derivations as those given in Chapter 3 (Section 3.3). 
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Using Eqs. (12.6) and (12.10), we obtain 


n 


ət (n) de, (k) 
=2 k 12.18 
Jw, n) dent as) are 
Using the identity 
N-1 
e, (k) = d(k) — 5 w; (n)x; (k) (12.19) 
i=0 
to evaluate the second factor on the right-hand side of Eq. (12.18), we get 
a(n) = 
= —2 ð e,(k)x;(k) (12.20) 
dw; (n) 2 


Furthermore, we note that when w(n) = w(n), 


acn) 
dw; (n) 7 


0, for i=0,1,...,N-—1 (12.21) 


Using Eqs. (12.20) and (12.21), we find that when w(n) = W(n), the following identities 
hold: 


n 


SY 2,(k)xi(k) =0, for i=0,1,...,.N-1 (12.22) 
k=1 


where é,,(k) is the optimized estimation error in the least-squares sense. This result, which 
is equivalent to Eq. (12.17), is known as the principle of orthogonality in the least-squares 
formulation. We define the vectors 


@(n) = [ê (1) ê, (2) -ên m)" (12.23) 


and 
x(n) = D x;(2) a) (12.24) 


and note that Eq. (12.22) may also be expressed in terms of these vectors as 
êT (nx; (n) =0, for i=0,1,...,N—1 (12.25) 


We note that the left-hand side of Eq. (12.25) is the inner product of é(n) and x;(n), thus 
the name principle of orthogonality. 

Comparison of Eqs. (12.17) and (12.25) reveals that the definition of orthogonality 
is in terms of statistical averages in Wiener filtering, whereas it is in terms of inner 
products of data vectors in the case of least-squares estimation. By dividing both sides 
of Eq. (12.22) by n, we see that we can also use time averages to define orthogonality in 
the least-squares case: 


1 n 
— 5°62, (x(k) =0, for i=0,1,...,N—1 (12.26) 
n 

k=1 
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An immediate corollary to the principle of orthogonality is that when the tap weights of a 
filter are optimized in the least-squares sense, the filter output and its optimized estimation 
error are orthogonal. That is, 

êT mn) = 0 (12.27) 


where (n) is the vector of the output samples of the filter when w(n) = W(n). This 


follows immediately if we note that Eq. (12.8) may also be written as 


N-1 
yin) = D> w;(n)x;(n) (12.28) 
i=0 


and use the identity (12.25) to obtain Eq. (12.27). 


Example 12.1 


Consider the case where n = 3, 


and 


d(3)= | —1 
0 


We wish to find w(3), y(3), and ê(3) and to confirm the principle of orthogonality. 
We have 


W(3) = X(3)X'(3) = È ji | 


4 5.01 
and 
03) = X(3)d(3) = | 
Thus, 
wa => 4] [i )o fo 4] f 1] foot 
Wea 5.01 -1| 7305| -4 5 ||-1| 7305| —9 
¥(3) = W(3)Xy (3) T w (3)x (3) 
2 1 9.02 
a Talade 
9.05 | o| 9205|01] 205| 09 
and 
1 1 [902 , [003 
êG) = d(3) - $6) = | -1 8.99 | = — | —0.06 
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We can now confirm the principle of orthogonality by noting that 
é7(3)x9(3) = 0 


and, also, 
êT(3)x (3) =0 


12.3 Projection Operator 


An alternative interpretation to the solution of least-squares problem can be given using 
the concept of projection operator. Projection of a 1-by-n vector d(n) into the subspace 
spanned by a set of vectors Xp(7), xı (n), ..., Xy—1 (7) is a vector d(n) with the following 
properties: 


1. The vector d(n) is obtained as a linear combination of the vectors x(n), 
x(n), ..., Xy). 

2. Among all the vectors in the subspace spanned by x(n), xı (n), ...,Xy—1(7), the 
vector d(n) has the minimum Euclidean distance from d(n). 

3. The difference d(n) — d(n) is a vector that is orthogonal to the subspace spanned by 
x(n), x10), ..., Xy- (7) 


We may note that the least-squares estimate y(n) satisfies the three properties listed above. 
Namely, we note from Eq. (12.28) that y(n) is also obtained as a linear combination 
of the vectors x(n), X,(”),...,X,_ ,(m). Furthermore, obtaining y(n) by minimizing 
é'(n)é(n), where €(n) = d(n) — ¥(n), is equivalent to minimizing the Euclidean distance 
between d(n) and y(n). Also, from the principle of orthogonality, the error vector é(n) = 


d(n) — y(n) is orthogonal to the vectors x(n), x,(”),...,X,_ (1). We thus conclude 
that y(n) is nothing but the projection of a(n) into the subspace spanned by the vectors 
X(n), xı (n), ..., Xy (n). 


We also note that from Eq. (12.8) 
§(n) = X" (nwn) (12.29) 


Substituting Eqs. (12.12) and (12.13) in Eq. (12.15) and the result in Eq. (12.29), we 
obtain 
y(n) = P(n)d(n) (12.30) 


where 
Pn) = XTM XMT n X(n) (12,31) 


Consequently, the matrix P(n) is known as the projection operator. 
Using Eq. (12.30), we find that the optimized error vector é(n) = d(n) — y(n) can be 
expressed as 
ê(n) = [I — P(n)|d(n) (12.32) 


where I is the identity matrix of the same dimension as P(n). As a result, the matrix 
I — P(n) is referred to as the orthogonal complement projection operator. 
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12.4 Standard Recursive Least-Squares Algorithm 


The least-squares solution provided by Eq. (12.15) is of very little interest in actual 
implementation of adaptive filters as it requires that all the past samples of the input as 
well as the desired output be available at every iteration. Furthermore, the number of 
operations needed to calculate Ŵ(n) grows proportional to n as the number of columns 
of X(n) and the length of d(n) grow with n. These problems are solved by employing 
recursive methods. In this section, as an example of recursive methods, we present the 
standard RLS algorithm. 


12.4.1 RLS Recursions 


In the standard RLS algorithm (or just “RLS” algorithm, for short), the weighting factor 
Pn (k) is chosen as 
Palk) =A", k=1,2,...,n (12.33) 


where à is a positive constant close to, but smaller than, 1. The ordinary method of 
least-squares, discussed in the previous sections, corresponds to the case of à = 1. The 
parameter A is known as the forgetting factor. Clearly, when A < 1, the weighting factors 
defined by Eq. (12.33) give more weightage to the recent samples of the error estimates 
(and thus to the recent samples of the observed data) compared to the old ones. In other 
words, the choice of à < 1 results in a scheme that puts more emphasis on the recent 
samples of the observed data and tends to forget the past. This is exactly what one may 
wish when he/she develops an adaptive algorithm with some tracking capability. Roughly 
speaking, 1/(1 — A) is a measure of the memory of the algorithm. The case of A = 1 
corresponds to infinite memory. 

Substituting Eq. (12.33) in Eq. (12.1) and using the vector/matrix notations of 
Section 12.1, we obtain 


t(n) = e'(n)A(n)e(n) (12.34) 

where A(n) is the diagonal matrix consisting of the weighting factors 1, A, A?, . . ., that is, 
Ae) i0 O >. 0 
0 X? 0 0 

Am=| 9 0 X0 (12.35) 
0 0 O >. 1 


Following the same line of derivations that led to Eq. (12.15), we obtain the minimizer 
of ¢(n) in Eq. (12.34) as 
W(n) = Wy! (n)O,(n) (12.36) 


where 
Y(n) = X(n)A(n)X"(n) (12.37) 


and 
0, (n) = X(n)A(n)d(n) (12.38) 
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On substituting Eqs. (12.4) and (12.7) in Eqs. (12.37) and (12.38), and expanding the 
summations, we get 


Y, (n) = x(n)x! (n) + Ax(n — 1)x" (n — 1) 
+42x(n — 2)x'(n — 2) +- (12.39) 
and 
0, (n) =x(n)d(n) + Ax(n — 1)d(n — 1) 
+A°x(n — 2d Gi = 2) + > (12.40) 


respectively. Using Eqs. (12.39) and (12.40), it is straightforward to see that Y, (n) and 
0, (n) can be obtained recursively as 


W, (n) = AW, (n — 1) + x(n)x"(n) (12.41) 


and 
0, (n) = 20, (n — 1) + x(n)d (n) (12.42) 


respectively. These two recursions and the following result of matrix algebra form the 
basis for the derivation of the RLS algorithm. 
For an arbitrary nonsingular N-by-N matrix A, any N-by-1 vector a and a scalar a, 


aA!aal Aq! 
1 + &aTA-!a 


(A + gaa)! = A7! (12.43) 
This identity, which was also used for some other derivations in Chapter 6, is a special 
form of the matrix inversion lemma. 

We let A = àẸ, (n — 1), a = x(n) and «œ = 1 to evaluate the inverse of Y, (n) = 
AY, (n—1)+ x(n)x'(n). This results in the following recursive equation for updating 
the inverse of W, (n): 


A727! (n — Dx(n)xT (n) WF! (n — 1) 


Wola) =Y! (n — 1) (12.44) 
$ A 1+A-!xT(n)W5 n — 1)x(n) 
To simplify the subsequent steps, we define the column vector 
aly n1 
k(n) = U UAN (12.45) 


1 +AT (mY (n — 1)x(n) 


The vector k(n) is referred to as the gain vector for reasons that will become apparent 
later in this section. 
Substituting Eq. (12.45) in Eq. (12.44), we obtain 


Wim) =a 1A (n — 1) — ka) x") WF (n — 1)) (12.46) 
By rearranging Eq. (12.45), we get 


k(n) = A7197 (n — 1) — k(n)xT (n) WF! (n — 1))x(n) (12.47) 
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Using Eq. (12.46) in Eq. (12.47), we obtain 
k(n) = Y7 ' (n)x(n) (12.48) 
Next, we substitute Eq. (12.42) in Eq. (12.36) and expand to obtain 
W(n) = AWS! (n)0, (n — 1) + Y7 ' (n)x(n)d(n) 
= AW, ! (n)O,(n — 1) + k(n) d(n) (12.49) 


where the last equality is obtained using Eq. (12.48). Substituting Eq. (12.46) in 
Eq. (12.49), we get 


Wn) = Wy! (n — 1)0, (n — 1) — k(n) x" (1) Wy" (n — 1)0, (n — 1) + k(n)d (n) 
= W(n — 1) —k(n)x! (n)W(n — 1) + k(n)d (n) 
= (n — 1) + k(n)(d(n) — T(n — 1)x(n)) (12.50) 


Finally, we define 


ê (n) = d(n) — T(n — 1)x(n) (12.51) 
and use this in Eq. (12.50) to obtain 
w(n) = w(n — 1) + k(n)é,_, (n) (12.52) 


This is the recursion used by the RLS algorithm to update W(n). The amount of change 
to be made in the tap weights at the nth iteration is determined by the product of the 
estimation error é,_,(n) and the gain vector k(n). 

From Eq. (12.51), we note that ê„_; (n) is the estimation error at time “n” based on 
the tap-weight vector estimated at time “n — 1”, W(n — 1). Hence, ê„_; (7) is referred to 
as the a priori estimation error. On the other hand, the a posteriori estimation error is 
given by 

é,(n) = d(n) — W'(n)x(n) (12.53) 


which would be obtained if the current least-squares estimate of the filter tap weights, 
that is, W(n), was used to calculate the filter output. 

Equations (12.45), (12.51), (12.52), and (12.46), in this order, describe one iteration of 
the standard RLS algorithm. 


12.4.2 Initialization of the RLS Algorithm 


Actual implementation of the RLS algorithm requires proper initialization of W,(0) and 
w(0) before the start of the algorithm. In particular, we note that the matrix 


Y, (n) = ba A” —kx (kx! (k) (12.54) 
k=1 


for values of n smaller than the filter length, N, has a rank which is less than its dimension, 
N. This implies that the inverse of W,(n) does not exist for n < N. A simple and 
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commonly used solution to this problem is to start the RLS algorithm with an initial 
setting of 
Y, (0) = ôI (12.55) 


where 6 is a small positive constant. Then, iterating the recursive equation (12.41), 
we obtain 


Y, (n) = X kx (kx! (k) + OA" (12.56) 
k=1 


We observe that, for A < 1, the effect of W, (0) reduces exponentially as n increases. 
Thus, this initialization of W, (0) has very little effect on the steady-state performance of 
the RLS algorithm. Furthermore, the effect of W, (0) on the convergence behavior of the 
RLS algorithm can be minimized by choosing a very small value for 6. 

As for the initialization of the filter tap weights, it is common practice to set 


w(0) = 0 (12.57) 


where 0 is the N-by-1 zero vector. However, setting W(0) equal to an arbitrary nonzero 
vector, also, does not result in any significant effect on the convergence and steady-state 
behavior of the RLS algorithm, provided that the elements of w(0) are not very large. A 
study of the effect of a nonzero selection of Ŵ(0) is discussed in Problem P12.6. 


12.4.3. Summary of the Standard RLS Algorithm 


Table 12.1 presents a summary of the standard RLS algorithm. This is one of the few 
possible implementations of the RLS algorithm. It exploits the special form of the gain 
vector, k(n), to simplify its computation using an intermediate vector u(n) = Y(n — 1) 
x(n). We also note that by multiplying the numerator and denominator of the right-hand 
side of Eq. (12.45) by à and using this definition of u(n), we obtain 


k(n) = (n) 


a4 xT mumn) 


Careful examination of Table 12.1 reveals that the computational complexity of this 
implementation is mainly determined by: 


1. Computation of the vector u(n) 

2. Computation of x(n) Wy! (n — 1) in Wn) update equation 

3. Computation of the outer product of k(n) and x! (n) Wi! (n — 1) in Wn) update 
equation. 

4. Subtraction of the two terms within the brackets in y7! (n) update equation, and scaling 
of the result by A7! 


Each of these steps require N? multiplications. In addition, steps 1, 2, and 4 require N? 
additions/subtractions, each. This brings up the total computational complexity of the RLS 
algorithm of Table 12.1 to about 4N? multiplications and 3N? additions/subtractions. 
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Table 12.1 Summary of the standard RLS algorithm (version I). 


Input: Tap-weight vector estimate, w(n — 1), 
Input vector, x(n), desired output, d(n), 
and the matrix win —1) 

Output: Filter output, j,,_, (n), 
tap-weight vector update, W(n), 
and the updated matrix Ww! (n) 


1. Computation of the gain vector: 
u(n) = Y7 ' (n — 1)x(n) 


k(n) = (n) 


Exu) 


2. Filtering: 
Fn 0) = 8" — 1)x(n) 


3. Error estimation: 
ê, (0) = d(n) — $, (0) 
4. Tap-weight vector adaptation: 
W(n) = Wn — 1) + k(n)ê, (1) 
5; Y(n) update: 


Win) =a 1a — 1) — kii" WE! — DY) 


The fact that Y, (n) is a symmetric matrix can be used to reduce the computational 
complexity of the RLS algorithm. Using this, we find that x(n) Wy! (n -)D= 
(wlan — 1)x(n))' = u" (n). The last step of Table 12.1 may then be simplified as 


Wola) =a 1A — 1) — kmu" (n)) (12.58) 


This amendment, although logical and precise, results in a useless implementation when 
applied in practice. Computer simulations and theoretical analysis (Verhaegen, 1989) show 
that this amended version of the RLS algorithm is numerically unstable. This behavior of 
the RLS algorithm is due to the roundoff error accumulation, which makes Wri (n —1) 
nonsymmetric. This in turn invalidates the assumption x(n) Wy! (n — 1) = u(n), which 
was used to introduce the above amendment. 

To resolve this problem and come up with an efficient and stable implementation 
of the RLS algorithm, one may compute only the upper or lower triangular part of 
Y(n) according to Eq. (12.58) and copy the result to obtain the rest of elements of 
Ww! (n) to preserve its symmetric structure (Verhaegen, 1989; Yang, 1994). Table 12.2 
presents a summary of this implementation of the RLS algorithm. Here, the operator Tri{-} 
signifies that the computation of Y(n) is based on either the upper or lower triangular 
parts. Clearly, this results in significant saving in computations as a large portion of the 
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Table 12.2 Summary of the standard RLS algorithm (version II). 


Input: Tap-weight vector estimate, w(n — 1), 
Input vector, x(n), desired output, d(n), 
and the matrix Y7 '(n — 1) 

Output: Filter output, ĵ„_; (n), 
Tap-weight vector update, w(n), 
and the updated matrix y! (n) 


1. Computation of the gain vector: 
u(n) = Y7 ' (n — 1)x(n) 


k(n) = (n) 


— u 
à +x! (n)u(n) 
2. Filtering: 

În) = win a 1)x(n) 
3. Error estimation: 


ent (n) == d(n) Bi dnt (n) 
4. Tap-weight vector adaptation: 


wn) = Win — 1) + k(n)é,,_, (n) 
3; Y(n) update: 


Wl) = Tri {A71 (Y7! (n — 1) — k(n) (n))} 


complexity of the RLS algorithm arises from computation of y7! (n). Basically, the com- 
putational complexity of steps 3 and 4 above (corresponding to Wn) update) is halved 
and step 2 is eliminated. This brings down the computational complexity of the RLS 
algorithm in Table 12.2 to about 2N* multiplications and 1.5N? additions/subtractions, 
which is about half of that of the algorithm of Table 12.1. 

From, the above discussion, one may perceive the potential problem of the RLS 
algorithm. It is indeed true that roundoff errors may accumulate and result in undesir- 
able behavior for any algorithm that works based on some recursive update equations. 
This statement is general and applicable to all LMS and least-squares-based algorithms. 
However, the problem turns out to be more serious in the case of least-squares-based 
algorithms. See Cioffi (1987) for an excellent qualitative discussion on the roundoff error 
in various adaptive filtering algorithms. Engineers who use adaptive filtering algorithms 
should be aware of this potential problem and must evaluate the algorithms on this issue 
before going for their practical implementations. 


12.5 Convergence Behavior of the RLS Algorithm 


In this section, we study the convergence behavior of the RLS algorithm in the context 
of a system modeling problem. As the plant, we consider a linear multiple regressor 
characterized by the equation 


d(n) = wix(n) + e,(n) (12.59) 
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where w, is the regressor tap-weight vector, x(n) is the tap-input vector, e,(m) is the 
plant noise, and d(n) is the plant output. The noise samples, e(n), are assumed to be 
zero-mean and white, and independent of the input samples, x(n). The tap-input vector 
x(n) is also applied to an adaptive filter whose tap-weight vector, w(n), is adapted so 
that the difference between its output, y(n) = w!(n)x(n), and the plant output, d(n), is 
minimized in the least-squares sense. 

The derivations that follow use the vector/matrix formulation adopted in the previous 
sections. In particular, we note that with the definitions (12.4) and (12.7), 


d(n) = X"(n)w, + e,(n) (12.60) 
where e(n) = [e,(1)e,(2) --- e,(n)]'. 


12.5.1 Average Tap-Weight Behavior of the RLS Algorithm 


We show that the least-squares estimate Ŵ(n) is an unbiased estimate of the tap-weight 


vector Wo. 
From Eqs. (12.36)—(12.38), we obtain 
W(n) = (X(n)A(n)X"(n))~!X(n) A(n)d(n) (12.61) 
Substituting Eq. (12.60) in Eq. (12.61), and using Eq. (12.37), we get 
W(n) = Wo + Wi (n)X(n)A(n)e,(n) (12.62) 


Taking expectation on both sides of Eq. (12.62) and recalling that X(n) and e,(n) are 
independent of each other, we obtain 


E[w()] = Wo + ELV; (MX (MJA M) Ele, (n)] 
=w, (12.63) 


where the last equality follows from the fact that e,(m) is a zero-mean process, that is, 
E[e,(n)] = 0, for all values of n. This result shows that Ŵ(n) is an unbiased estimate 
of w,- 

The above derivation does not include the effect of initialization, that is, W~!(0) = 6—'I, 
which is required for proper operation of the RLS algorithm. This initialization introduces 
some bias in w(n) that is proportional to ô and decreases as n increases (Problem P12.3). 


12.5.2 Weight-Error Correlation Matrix 
Let us define the weight-error vector 
V(n) = W(n) — W, (12.64) 
From Eq. (12.62), we obtain 
¥(n) = Wy! (n)X(n) A (ne, (n) (12.65) 
We also define the weight-error correlation matrix 


R(n) = E[¥(n)¥"(n)] (12.66) 
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Substituting Eq. (12.65) in Eq. (12.66) and noting that (#7 '(n))T = Y7 ' (n) and AT (n) = 
A(n), we obtain 


K(n) = EII MX MA Me (ned (n) A (n)X! (n) WF! (n)] (12.67) 
Recalling the independence of e,(n) and x(n), from Eq. (12.67), we obtain 
R(n) = EI MX MAM Ele, (neg MIA MX (0) Wn) (12.68) 
Since e,(n) is a white noise process, 
Efe, (n)es(n)] = o2 I (12.69) 


where oĉ is the variance of e,(n), and I is the identity matrix with appropriate dimension. 
Finally, substituting Eq. (12.69) in Eq. (12.48), we get 


R(n) = o EW (W202) 0, (n)] (12.70) 


where W,2(n) = X(n)A7(n)X7(n). 
Rigorous evaluation of Eq. (12.70) is a difficult task. Hence, we make the following 
assumptions to facilitate an approximate evaluation of K(n): 


1. The observed input vectors x(1), x(2),..., x(n) constitute the samples of an ergodic 
process. Thus, the time averages may be used instead of the ensemble averages. 

2. The forgetting factor à is very close to 1. 

3. The time “n” at which K(n) is evaluated is large. 


We note from Eq. (12.39) that YW, (n) is a weighted sum of the outer products x(n)x! (n), 


x(n — 1)x'(n — 1), x(n — 2)xT (n — 2), .... Thus, considering the above assumptions, we 
find that 
1—A" 
Y, (n) & T R (12.71) 


where R = E[x(n)x"(n)] is the correlation matrix of the input. 
Substituting Eq. (12.71) in Eq. (12.70), we obtain 


2 2 
R(n) = 02 ea ge 
°=” 1-12 
1-A 1+1” 
=g? „IFA R- (12.72) 
1+2A 1—13” 
In the steady state, that is, when n —> ov, we obtain from Eq. (12.72) 
K(oo) -02 ôR- (12.73) 
~ OTEA l 


12.5.3 Learning Curve 


From the summary of the RLS algorithm in Table 12.1 (or Table 12.2), we find that the 
filter output at time “n” is obtained according to the equation 


3,_1(n) = W'(n — 1)x(n) (12.74) 
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Accordingly, the learning curve of the RLS algorithm is defined in terms of the a priori 
error ĉê„_ (n) as 
£1) = E[&_1@)] (12.75) 


To evaluate Ê,_; (n), we proceed as follows. 
Using Eqs. (12.64) and (12.59), we obtain 


ê,- 1(0) = d(n) — W'(n — 1)x(n) 
= d(n) — wix(n) — "(n — 1)x(n) 
= e,(n) — T(n — 1)x(n) (12.76) 
Substituting Eq. (12.76) in Eq. (12.75), we obtain 
E,_1(n) = El(e,(n) — #"(n — 1)x(n))"] 
= E[e?(n)] — 2E[#' (n — 1)x(n)e,(n)] 
+E[¥' (n — 1)x(n)x' (n)¥(n — 1)] (12.77) 


where the last term is obtained by noting that ¥'(n — 1)x(n) = x'(n)¥(n — 1). 

To simplify the evaluation of the last two terms on the right-hand side of Eq. (12.77), 
we make use of the assumptions we have made on e,() and x(n). First, e,(m) is a zero- 
mean white process and second, e,(m) and x(n) are independent. Consequently, e (n) 
is also independent of (n — 1)x(n) as ¥(n — 1) depends only on the past observations, 
which includes only the past samples of e,(n). Noting these, we obtain 


EKTn — 1)x(n)e,(n)] = EKT (n — 1)x(n)]E[e,(n)] = 0 (12.78) 


as E[e,(n)] = 0. 

To simplify the third term on the right-hand side of Eq. (12.77), we also assume that 
¥(n — 1) and x(n) are independent. This is similar to the independence assumption in the 
case of LMS algorithm (Chapter 6). Strictly speaking, the latter assumption is hard to 
justify as ¥(n — 1) depends on the past samples of x(n), and the current x(n) may not be 
independent of its past samples. However, when n is large, (n — 1), which is determined 
by a large number of past observations of x(n), only weakly depends on the present 
sample of x(n). This is because the older samples of x(n) are only loosely dependent on 
the present x(n) unless x(n) contains one or more significant narrow-band components. 
We exclude such special cases in our study here and assume that ¥(n — 1) and x(n) are 
independent of each other. This is referred to as the independence assumption in analogy 
with the independence assumption used in the case of LMS algorithm. However, we note 
that unlike the LMS algorithm, for which the independence assumption could be used 
for all (small and large) values of the time index n, in the case of RLS algorithm, the 
independence assumption is valid only for large values of n. 

Using the independence assumption, we get 


EKTn — 1)x(n)x! (n)¥(n — 1)] = EKT — DERM w] — 1)] 
= E[¥"(n — 1I)RV(n — 1)] (12.79) 
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Next, we note that E[¥'(n — 1)R¥(n — 1)] is a scalar and proceed as follows: 
EÑ" (n — )RV(n — 1)] = tr[E T(n — DRI — 1)]] 
= Eftr[v'(n — R(n — 1)]] 
= Eftr[¥(n — 1)97(n — 1)R]] 
= tr E[¥(n — 1)9'(n — 1)]R] 
= tr[K(n — 1)R] (12.80) 


where tr[-] denotes the trace of the indicated matrix. In the derivation of Eq. (12.80), 
we have used the linearity property of the expectation and trace operators, the definition 
(12.66), and the identity 

tr[AB] = tr[BA] 


which is valid for any pair of N-by-M and M-by-N, A and B matrices, respectively. 
Substituting Eqs. (12.80) and (12.78) in Eq. (12.77), we get 


E _i(n) = Enin + tr[K(n — DR] (12.81) 
where Enin = Ele2(n)] is the minimum MSE of the filter which is achieved when a perfect 
estimate of w, is available. Substituting Eq. (12.72) in Eq. (12.81), we obtain 


a 1=À 14a"! 
En- (1) = É min + ita I Arl 


This describes the learning curve of the RLS algorithm. Note that we made the 
assumption of n being large in the derivation of Eq. (12.82). Thus, Eq. (12.82) can 
predict the behavior of the RLS algorithm only after certain initial transient period. 
Some comments on the behavior of the RLS algorithm during its initial transient period 
will be given later in this section. 

At this point, it is instructive that we elaborate on the behavior of the RLS algorithm, 
as predicted by Eq. (12.82). We note that the second term on the right-hand side of 
Eq. (12.82) is a positive value, indicating the deviation of E, 1(n) from &,;,. This term 
converges toward its final value as n grows. The speed at which this term converges is 
determined by the exponential term 4”~!, or equivalently 4”. Accordingly, we define the 
time constant Tp; associated with the RLS algorithm using the following equation: 


ah — eo n/TRLS (12.83) 


NEmin (12.82) 


Solving this for tays, we obtain 
1 
“CaA 12.84 
TRLS mÀ ( ) 
To simplify this, we use the following approximation: 
ln (1+x)%x, for |x|«1 (12.85) 


We note that O < 1 — à < 1 as à is smaller than, but close to, 1. Using this in Eq. (12.85), 
we get 


In à= (1 - (1 -= X)) ~ -(1 — å) (12.86) 


426 Adaptive Filters 


Substituting Eq. (12.86) in Eq. (12.84), we obtain 


———— 12.87 
TRLS Ià ( ) 


We thus note that the convergence behavior of the RLS algorithm is controlled by only 
a single mode of convergence. Unlike the LMS algorithm whose convergence behavior 
is affected by the eigenvalues of the correlation matrix, R, of the filter input, the above 
results show that the convergence behavior of the RLS algorithm is independent of the 
eigenvalues of R. This may be explained by substituting Eq. (12.48) in the RLS recursion 
(12.52) to obtain 

w(n) = w(n — 1) + W5 | (n)é,_(n)x(n) (12.88) 


When n is large, we may use the approximation (12.71) in Eq. (12.88) to get 


= An 


W(n) = Win D+ R12, (n)x(n) (12.89) 


When n is large such that à” « 1, the above recursion simplifies to 
W(n) = Win — 1) + (1 — ART! ê, x(n) (12.90) 


Letting à = 1 — 2u, we find that this is nothing but the LMS—Newton algorithm, which 
was introduced in Chapter 7 (Section 7.5), and its convergence behavior was found to be 
independent of the eigenvalues of R. 

At this point, once again, we shall remind the reader that most of the results developed 
above are valid only when n is large. In particular, the similarity between RLS and 
LMS -Newton algorithms, which was noted above is valid in the sense that for large 
values of n, the RLS recursion approaches an update equation which is similar to the 
LMS-—Newton recursion. However, this does not mean that the RLS and LMS—Newton 
algorithms have the same convergence behavior as the convergence behaviors of the two 
algorithms are completely different when n is small. This is further illustrated through 
the simulation results, which are presented in Section 12.5.5. 


12.5.4 Excess MSE and Misadjustment 


As in Chapter 6, we define the excess MSE of the RLS algorithm as the difference between 
its steady-state MSE and the minimum achievable MSE. In other words, we define 


Excess MSE = lim £, (n) — Emin (12.91) 
n—- Oo 


Using Eq. (12.82 ) in Eq. (12.91), we obtain 


1-A 
Excess MSE = Tg aN Emin (12.92) 


As in the case of LMS algorithm, the misadjustment of the RLS algorithm is given by 


Excess MSE 
Mais = - (12.93) 
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Substituting Eq. (12.92) in Eq. (12.93), we obtain 
1-2 


Maus = Ty), 


N (12.94) 


12.5.5 Initial Transient Behavior of the RLS Algorithm 


Much of the benefit of the RLS algorithm is attributed to the fact that it shows very fast 
convergence when it is started from a rest condition with the initial values of w(0) = 0 
and W~'(0) = 6~!I. This fast convergence is observed only after the first N samples of 
the input and desired output sequences are processed. In typical implementations of the 
RLS algorithm, we always find that the MSE of adaptive filter converges to a level close 
to its minimum value within a small number of iterations (usually two to three times the 
filter length) and then it proceeds with a fine-tuning process that may last much longer 
before the MSE reaches its steady-state value. The initial transient behavior of the RLS 
algorithm can be best explained through a numerical example (computer simulation). 

As a numerical example, here, we apply the RLS algorithm to the system modeling 
setup of Section 6.4.1. Thus, a comparison of the RLS and LMS algorithms can be 
made. Figure 12.1 presents the schematic diagram of the modeling setup. The common 
input, x(n), to the plant, W,(z), and adaptive filter, W (z), is obtained by passing a unit 
variance white Gaussian sequence, v(n), through a filter with the system function H(z). 
The plant noise, as before, is denoted by e,(n). It is assumed to be a white noise process 
independent of x(n). 

For our experiments, as in Section 6.4.1, we select oè = 0.001 and 


7 14 
Wo) =o -Y (12.95) 

i=0 i=8 
The length of the adaptive filter, N, is chosen equal to the length of W,(z), that is, N = 15. 
Also, we present results of simulations for two choices of input that are characterized by 
H(z) = H (z) = 0.35 + z7! — 0.3527 (12.96) 


and 
H(z) = H(z) = 0.35 + z7! + 0.3577? (12.97) 


Figure 12.1 Adaptive modeling of an FIR plant. 
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The first choice results in an input, x(n), whose corresponding correlation matrix has an 
eigenvalue spread of 1.45. This is close to white input. On the contrary, the second choice 
of H(z) results is a highly colored input with an associated eigenvalue spread of 28.7 
(Section 6.4.1). 

Figure 12.2a and b shows the learning curves of the RLS algorithm for the two choices 
of the input. Each plot is obtained by averaging over 100 independent simulation runs. In 
all the runs, the RLS algorithm was started with zero initial tap weights, and the parameter 
ô (used to initialize ¥~! (0) = 5—'D) was set to 0.0001. The forgetting factor à was chosen 
according to Eq. (12.94) to achieve a misadjustment of 10%. From Figure 12.2, we see 
that the convergence behavior of the RLS algorithm is independent of the eigenvalue 
spread of the correlation matrix, R, of the filter input. This is in line with the theoretical 
predictions of the last section. Furthermore, we find that the learning curves of the RLS 
algorithm are quite different from the learning curves of the LMS algorithm — compare 
the learning curves of Figure 12.2a and b with their LMS counterparts in Figure 6.7a 
and b, respectively. 

Each learning curve of the RLS algorithm may be divided into three distinct parts: 


1. During the first N iterations (N is the filter length), the MSE remains almost unchanged 
at a high level. 

2. The MSE converges at a very fast rate once the iteration number, n, exceeds N. 

3. After this period of fast convergence, the RLS algorithm converges toward its steady 
state at a much slower rate. 


The three separate parts of the learning curves of the RLS algorithm may be explained 
as follows. 

During the first N — 1 iterations, that is, when n < N, there are infinite number of 
possible choices of the tap-weight vector W(), which satisfy the set of equations 


w'(n)x(k) =d(k), for k=1,2,...,n (12.98) 


as there are less number of equations than the number of unknown tap weights. This 
ambiguity in the solution of the least-squares problem is manifested in the coefficient 
matrix Y(n) whose rank remains less than its dimension, N. This rank deficiency problem, 
as suggested before, is solved by initializing W(O) to a positive definite matrix, ôI, which 
will result in a full-rank Y(n), and thus, a solution for W(n) with no ambiguity. However, 
the resulting solution, although satisfies Eq. (12.98) within a very good approximation, 
may not give an accurate estimate of the true tap weights of the plant, w,. As a result, 
one finds that during the first N — 1 iterations while the a posteriori error, é,(n), is very 
small, the a priori error, é,_,(n), may still be large. This initial behavior of the RLS 
algorithm also explains why in the definition of its learning curve we use é,_,(m) and 
not é,,(n). Clearly, the latter does not reflect the fact that the least-squares estimate w(7) 
may be far from its true value, w,- 

On the contrary, when n > N, the number of equations in Eq. (12.98) is more than the 
number of unknown tap weights, which we wish to estimate. In that case, Eq. (12.98) 
cannot be satisfied exactly. However, a least-squares solution can be found without any 
ambiguity. The accuracy of the estimate w(n) of w, depends on the level of the plant noise 
and also the number of observed points, that is, the iteration number n. In particular, in the 
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(a) 
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NO. OF ITERATIONS 


(b) 


Figure 12.2 Learning curves of the RLS algorithm for the modeling problem of Figure 12.1: 
(a) H(z) = H(z) and (b) H(z) = H,(z). In both cases, the forgetting factor À is chosen according 
to Eq. (12.94) for 10% misadjustment. 
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case where the plant noise, e, (n), is zero, Eq. (12.98) can be satisfied exactly, for all values 
of k, by choosing W(n) = w,. This clearly is the least-squares solution to the problem, as 
it results in ¢, (n) = 0, which is the minimum achievable value of the cost function ¢, (n). 
The RLS algorithm, which is designed to minimize the cost function ¢, (n), will find this 
optimum estimate of wW(n), once there are enough samples of the observed input and 
desired output such that the filter tap weights could be found without any ambiguity. This 
explains the sharp drop of the learning curve of the RLS algorithm when n exceeds N. 

The last part of the learning curve of the RLS algorithm, which decays exponentially, 
matches the results of Section 12.5.3. 


Problems 


P12.1 The observed samples of the input to a three-tap filter are 


1 —2 1 0 
x(1)=]|—1]|, xQ= 1|, x@=]1], x@4=]-1 
0 —1 1 -1 


(i) Find the projection and the orthogonal complement projection operators of 
the set of observed input vectors. 
(ii) Using the results of (i), find the least-squares estimate (4) of the desired 
vector 
d(4)=[1 2 -1 -1]" 


Also, obtain the associated estimation error vector e,(4). To check on the 
accuracy of your results, evaluate §1(4)e,(4) and show that it is equal 
to zero. 
(iii) Repeat (ii) for 
d(4)=[0 =1 1 -1]' 


P12.2 Repeat Problem P12.1 when the observed samples of input are x(1), x(2), and 
x(3), as given there, and we wish to obtain the least-squares estimates, y(3), of 


© dG)=[1 117 
(ii) d(3)=[1 —1 27T 


P12.3 Show that the initialization Y7! (0) = §~'I will introduce a bias in W(n) which 
is given by 
Aw(n) = E[w(n)] — w, = —5." EYI | (n) Iw, 
Simplify this result for the case when A = 1 and n is large. 


P12.4 Show that when (X'(N))~! exists, 
(X7(N))~! = (X(N)X™(N))' X(N), 


and thus conclude that the solution provided by the equation X'(N)w(N) = 
d(N) and the least-squares solution W(N) = (X(N)X'(N))~!X(N)d(N) are 
the same. What would be the a posteriori estimation error ey (N)? 
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P12.5 Show that in the RLS algorithm, when A = 1, and the iteration number, n, is 


large, 
N 
Marrs = — 


n 
where N is the filter length. 

P12.6 Consider the case where the RLS algorithm begins with a nonzero w(0) and 
Y, (0) = ôl. 


(i) Show that this initial condition is equivalent to solving the problem of 
least-squares according to the following procedure: 
e Let Y, (0) = ôI and 6, (0) = dw(0). 
e Update W, (n) and 6, (n) using the recursions (12.41) and (12.42). 
e Calculate W(n) using the equation w(n) = W5 | (n)0,(n). 

(ii) Use the result of (i) to show that when 6 is small, a nonzero choice of w(0) 
has no significant effect on the convergence behavior of the RLS algorithm. 

P12.7 Give a detailed derivation of Eq. (12.71). 


P12.8 In some modeling applications, the tap-weight misalignment defined as 
n(n) = E[(w(n) — wo)" (w(n) — Wo)] 
may be of interest. 


(i) Using Eq. (12.73), find the steady-state misalignment of the RLS algorithm, 
Nris (Co), and show that it is a function of the eigenvalues of the correlation 
matrix R. How does the eigenvalue spread of R affect npy (00)? 

(ii) Refer to Chapter 6, Section 6.3.3, and show that for the LMS algorithm, 


NLms (1) = sum of elements of the vector k’(n) 


Then, starting with Eq. (6.55), show that 


N-1 1 
Yio 1-2; 
N-I pai 
= Jiz EE 
For the case where uà; < 1, for all values of i, simplify this result and show 
that ny ms(©0) depends only on SA ry à; = tr[R], and thus is independent 


of the eigenvalue spread of R. 
(iii) Discuss on the significance of your observations in (i) and (ii). 


nims (CO) = Emin i 


P12.9 Consider the modified least-squares cost function 
n 
taln) = $A" ken (k) + A" K (W(n) — W(0))" (Wn) — WO)) 
k=1 


where w(0) Æ 0 is the initial tap-weight vector and K is a constant. 


(i) Use this cost function to derive a modified RLS algorithm. 
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P12.10 


P12.11 
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(ii) Obtain an expression for the learning curve of the filter and discuss the 
convergence behavior of the proposed RLS algorithm, for small and large 
values of K. 


Formulate the problem of modeling a memoryless nonlinear system with input 
x(n) and output 
y(n) = ax? (n) + bx? (n) + cx(n) 


where a, b, and c are the unknown coefficients of the system that should be 
found in the least-squares sense. 


Repeat Problem P12.10 for the case where 
y(n) = ax? (n) + bx? (n) +cx(n)+d 


and a, b, c, and d are the unknown coefficients of the system. 


Computer-Oriented Problems 


P12.12 


P12.13 


P12.14 


P12.15 


The MATLAB program used to obtain the simulation results of Figure 12.2 is 
available on an accompanying website. This is written based on version I of the 
RLS algorithm, as in Table 12.2, and is called rlsI.m. Run this program (or 
develop and run your own program) to verify the results of Figure 12.2. Also, 
to gain a better insight into the behavior of the RLS algorithm, try the following 
runs: 


(i) Run rlsI.m for (0) = 0 and values of 6 = 0.001, 0.01, 0.1, and 1, and 
compare your results with those in Figure 12.2. 
(i) Run r1sI.m for 6 = 0.0001 and a few (randomly selected) nonzero values 
of w(0). Comment on your observation. 
(iii) Repeat (ii) for values of 6 = 0.001, 0.01, 0.1, and 1. 


Develop a simulation program to study the variation of the a posteriori MSE, 
én) =E [êZ (n)], of the RLS algorithm in the case of the modeling problem of 
Figure 12.1. Comment on your observation. 


Develop a simulation program to study the convergence behavior of the RLS 
algorithm when applied to the channel equalization problem of Section 6.4.2. 
Compare your results with those of the LMS algorithm in Figure 6.9a and b. 


Consider the case where a sequence x(n) is generated by passing a unit variance 
white noise through a filter with the transfer function 


1 
1 — 1.677! + 0.65z~? 
By taking x(n) as input to a forward linear predictor of order N whose coef- 
ficients are adapted using the RLS algorithm, study the convergence of the 


predictor coefficients for N = 2, 3, and 4 and the choices of the forgetting factor 
à = 0.9, 0.95, and 0.99 as n varies from 1 to 50. Discuss your observation. 


H(z)-= 
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Fast RLS Algorithms 


The standard recursive least-squares (RLS) algorithm, which was introduced in 
Chapter 12, has a computational complexity that grows proportional to the square of 
the length of the filter. For long filters, this may be unacceptable. In the past, many 
researchers have attempted to solve this drawback of the least-squares method and have 
come-up with a variety of elegant solutions. These solutions, whose computational 
complexity grows proportional to the length of the filter, are commonly referred to as 
fast RLS algorithms. 

In this chapter, we review the underlying principles that are fundamental to the devel- 
opment of fast RLS algorithms. Our intention by no means is to cover the whole spectrum 
of the fast RLS algorithms. A thorough treatment of these algorithms requires many more 
pages than what is allocated to this topic in this book. Moreover, such a treatment is 
beyond the scope of this book, whose primary aim is to serve as an introductory text 
book on adaptive filters. Our aim is to put together the basic relations on which most fast 
RLS algorithms are built. Once these basic concepts are grabbed by the reader, he/she 
would feel comfortable to proceed with reading the more advanced topics on this subject 
(see Haykin (1991, 1996) and Kalouptsidis and Theodoridis (1993) for more extensive 
treatment of the RLS algorithms). 

All the fast RLS algorithms benefit from the order-update and time-update equations 
similar to those introduced in Chapter 11. In other words, the fast RLS algorithms 
combine the concepts of prediction and filtering in an elegant way to come-up with 
computationally efficient implementations. Among these implementations, RLS lattice 
(RLSL) algorithm appears to be numerically the most robust implementation. The fast 
transversal RLS (FTRLS) algorithm (also known as fast transversal filter — FTF), on 
the other hand, is an alternative solution that has minimum number of operation count 
among all the present RLS algorithms. 

In this chapter, our emphasis is on the RLSL algorithm. The derivation of the RLSL 
algorithm leads to a number of order- and time-update equations, which are fundamental 
to the derivation of the whole class of fast RLS algorithms. We also present the FTRLS 
algorithm as a by-product of these equations. Since lattice structure is closely related to 
forward and backward linear predictors, we begin with some preliminary discussion on 
these predictors. 
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13.1 Least-Squares Forward Prediction 


Consider the mth-order forward transversal predictor shown in Figure 13.1. The tap- 


weight vector a,,(”) = [an 1) Gn2() > amm (n)]" is optimized in the least-squares 
sense over the entire observation interval k = 1,2,...,n. Accordingly, in the forward 
transversal predictor, the observed tap-input vectors are x,,(0), x,,(1), .... Xpan — 1), 
where x,, (k) = [x(k) x(k— 1) +++ x(k—m + 1)]", and the desired output samples are 
x(1), x(2), ..., x(n). The normal equations of the forward transversal predictor are then 
obtained as (Chapter 12) 
W,,(n — Na, (n) = win) (13.1) 
where 
Vin (n) = Yo aX (KX (Kd) (13.2) 
k=l 
n 
Wn) = SOAK Ck = 1) (13.3) 
k=l 


and A is the forgetting factor. The least-squares sum of the estimation errors is then given 
by 


n 


cf (n) = yO (13.4) 
k=1 
where 
fnn) = x(k) — al n)x,, (k — 1) (13.5) 


The sequence fn ,(k) is known as the a posteriori prediction error as its computation is 
based on the latest value of the predictor tap-weight vector a, (n). In contrast, the a priori 


x(n) 


Figure 13.1 Transversal forward predictor. 
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prediction error of the forward predictor is defined as 


Fn.n—\(k) = x(k) —al(n — 1)x„ (k — 1) (13.6) 


where a,,(n — 1) is the previous value of the predictor tap-weight vector. 

We recall that according to the principle of orthogonality, the a posteriori estimation 
error, fi, n(k), and the predictor tap-input vector, x,,(k — 1), are orthogonal, in the sense 
that 


DOA fnn Onk- 1) = 0 (13.7) 


k=1 


Substituting Eqs. (13.5) and (13.7) in Eq. (13.4), one can show that (Problem P13.1) 


Gh (n) = YO ak) — ap yh) (13.8) 


k=1 


This result could also be obtained by inserting the relevant variables in Eq. (12.16). 
Application of the standard RLS algorithm for adapting the forward transversal predictor 
results in the following recursion: 


an(n) = a,,(n — 1) + k„(n = 1) fm n-1 0) (13.9) 


where fm n-1(7) is the latest sample of the a priori estimation error of the forward 
predictor and k„(n — 1) is the present gain vector of the algorithm. The time index 
“n — 1” in the gain vector here follows the predictor tap-input whose latest value is 
X,,(n — 1). The use of this notation also keeps our notations consistent, as we proceed 
with similar formulations for the backward transversal predictor. Furthermore, following 
the results of Chapter 12, Eq. (12.48), the gain vector of the forward transversal predictor 
is obtained as 

k,,(n — 1) = Y3! (n— 1)x„(n — 1) (13.10) 


At this point, we may note that there are a few differences between some of our notations 
here and those in Chapter 12. Since we will be making frequent use of the order-update 
equations in the derivations of this chapter, the order of the predictor, m, is explicitly 
reflected on all the variables. On the other hand, we have dropped the subscript à (the 
forgetting factor) from the correlation matrix Y, (n) and the vector Y}, (n) to simplify the 
notations. However, the subscript m has been added to them to indicate their dimensions. 
The hat-sign that was used to refer to optimized tap-weight vectors and prediction errors 
is also dropped here in order to simplify the notations. 


13.2 Least-Squares Backward Prediction 


Figure 13.2 depicts an mth-order backward transversal predictor. Here, the predictor tap- 
weight and tap-input vectors are g,,(2) = [8m 10) 8m 20) `t Emm (n)]" and x„(k) = 
[x(k) x(k=1) --- ath om 1", respectively. The tap-weight vector, g„(n), of the 
predictor is optimized in the least-squares sense over the entire observation interval k = 
1,2,...,n. The samples of the desired output are x(1 — m), x(2 — m), ..., x(n — m). 
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Im,m 


Figure 13.2 Transversal backward predictor. 


Accordingly, the normal equations associated with the backward transversal predictor are 
obtained as (Chapter 12) 

Vn Mn n) = b (n) (13.11) 
where Y(n) is given in Eq. (13.2), and 


n 


b(n) = > nk x(k — m)x,,(k) (13.12) 
k=1 


The least-squares sum of the estimation errors is then given by 


co yr a(k) (13.13) 
k=1 
where 
Dm,n (k) = x(k ~ m) = g (1)X (k) (13.14) 


The sequence b, „(k) is the a posteriori estimation error of the backward predictor. In 
contrast, the a priori prediction error is defined as 


Din.n—1(k) = x(k — m) — g} (n — 1)x„ (k) (13.15) 


where g,,,(n — 1) is the previous value of the predictor tap-weight vector. 
We recall that according to the principle of orthogonality, the a posteriori estimation 
error, b,n ,(k), and the predictor tap-input vector, x,,(k), are orthogonal, in the sense that 


5 AE bn n (KX (k) = 0 (13.16) 
k=l 


Substituting Eqs. (13.14) and (13.16) in Eq. (13.13), one can show that (Problem P13.2) 


cn (n) = JO ax? — m) = gn ya C) (13.17) 
k=1 
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This result could also be obtained by inserting relevant variables in Eq. (12.16). 
Application of the standard RLS algorithm for adapting the backward transversal pre- 
dictor results in the following recursion: 


Sm (n) = Sm (i =1) k, )bn n10) (13.18) 


where b,, ,,_;() is the latest sample of the a priori estimation error of the backward 
predictor and k„(n) is the present gain vector of the algorithm, given by 


k,, (n) = W;,'(n)x,, (n) (13.19) 


It is instructive to note that the gain vector of an mth-order forward predictor is equal to 
the previous value of the gain vector of the associated backward predictor. This, clearly, 
follows from the fact that the tap-input vectors of forward predictor, x,,(k — 1), and 


backward predictor, x,,(k), are one sample apart, for a given time instant k. 


13.3 Least-Squares Lattice 


Figure 13.3 depicts the schematic of a lattice joint process estimator. This is similar to 
the lattice joint process estimator presented in Chapter 11 (Figure 11.7). However, there 
are some changes made in the notations here so that they can serve our discussion in 
this chapter better. From the last chapter, we recall that in the least-squares optimization, 
at any instant of time, say n, the filter parameters are optimized based on the observed 
data samples from time 1 to n, so that a weighted sum of the error squares is minimized. 
With this view of the problem, in Figure 13.3, the time index of the signal sequences 
is chosen to be k, and at time n, k is varied from | to n. Furthermore, following the 
notations in the last two sections, the estimation errors are labeled with two subscripts. 
The first subscript denotes the filter/predictor length/order. The second subscript indicates 
the length, n, of the observed data. The lattice PARCOR coefficients, id, (n) and Kl (n), 
and the regressor coefficients, c„(n), are also labeled with time index n to emphasize 
that they are optimized in the least-squares sense on the basis of the data samples up to 
time n. In addition, to facilitate the derivation of the RLS lattice in the next section, the 
summer at the output of the joint process estimator is divided into a set of distributed 
adders so that the estimation errors of order 1 to N (denoted as e; „(k) through ey „(k)) 
can be obtained in a sequential manner, as will be explained later. l 
From Chapter 11, we recall that when the input sequence, x(n), is stationary, and the 
lattice coefficients are optimized to minimize the mean-squared errors of the forward and 
backward predictors, the PARCOR coefficients «f, and x? are found to be equal. However, 
we note that this is not the case when the optimization of the lattice coefficients is based 
on least-squares criteria. Noting this, throughout this chapter, we keep the superscripts f 
and b on «f, and «>, respectively, to differentiate between the two. 


To optimize the coefficients of the lattice joint process estimator, the following sums 
are minimized, simultaneously: 


eS ee Ae, for m=1,2,...,N—1 (13.20) 
k=1 
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Figure 13.3 Least-squares lattice joint process estimator. 


Fast RLS Algorithms 439 


n 
ch (n) = DO Ab nlk), for m=1,2,...,N—1 (13.21) 
k=1 
n 
eD SYO A ehn, for m=1,2,...,N (13.22) 
k=l 


where fn n(k) and b,, n(k) are the a posteriori estimation errors as defined before, and 
similarly e,, „(k) is defined as the a posteriori estimation error of the length m joint 
process estimator. Note from Figure 13.3 that there are effectively N forward and N 
backward predictors of order 1 to N, as well as N joint process estimators of length 1 to 
N, optimized simultaneously. Furthermore, it is important to note that the least-squares 
sums Eqs. (13.20)—(13.22) are independent of whether the predictors and joint process 
estimators are implemented in transversal or lattice forms. We will exploit this fact to 
simplify the derivations that follow by switching between the equations derived for the 
transversal and lattice forms to arrive at the desired results. 

In Chapter 11, we discussed a number of properties of the lattice structure. In particular, 
we noted that the backward prediction errors of different orders are orthogonal (uncor- 
related) with one another. This and many other properties of the lattice structure, which 
were discussed in Chapter 11 based on stochastic averages, are equally applicable to the 
lattice structure of Figure 13.3, where optimization of the filter coefficients is done based 
on time averages (because of the least-squares optimization). The most important proper- 
ties of the least-squares lattice that are relevant to the derivation of the RLSL algorithm 
in the next section are the following: 


1. At time n, the PARCOR coefficients ick (n) and KP (n) of the successive stages of the 
lattice structure can be optimized sequentially as follows. We first note the sequences at 
the tap inputs of the first stage (i.e., the signals multiplied by the PARCOR coefficients 
Kt(n) and k? (n)) are fy „(k) and by ,_,(k — 1), fork = 1, 2,..., (Figure 13.3). Using 
these sequences, the coefficients kt (n) and x? (n) are optimized so that the output 
sequences fı ,(k) and bı ,(k) of the first stage are minimized in least-squares sense. 
Next, we note that the tap inputs in the second stage are fı „(k) and bi ,_,(k — 1). 
Accordingly, considering the sequences f; ,(k) and b; „—ı(k — 1), fork = 1,2,...,n, 
as tap inputs to the second stage, the coefficients kf (n) and K} (n) are optimized so that 
the output sequences fy „(k) and b, „(k) of the second stage are minimized in least- 
squares sense. This process continues for the rest of the stages as well. The above 
process leads to the following equations, which are the bases for derivation of the 
RLS algorithm, as explained in the next section: 


Xi aes fain (k)bmn-1,n-1(k _ 1) 
k=1 ark biin- k =1) 


kf (n) = (13.23) 


and 


=i E aia O bnini = 1) 
Dià an= ig) 


Kè (n) = (13.24) 


for m = 1,2,..., N — 1. 
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2. Once the PARCOR coefficients are optimized, the backward prediction errors bg _,,(k), 


by ,(k), ... by n(k) are orthogonal with one another, in the sense that 
n 
XOA bp Obja (k) = 0 (13.25) 
k=1 


for any pair of unequal i and j in the range of 0 to N — 1. 

3. The regressor coefficients co(n), cı (n), .. ., Cy—1(7) may also be optimized in a sequen- 
tial manner. That is, first co(n) is optimized by minimizing ¢;°(n). We then hold co(n), 
run the sequence {x(1), x(2),..., x(m)} through the first stage of the lattice, and opti- 
mize c,(n) so that ¢5°(n) is minimized. This process continues for the rest of the joint 
process estimator coefficients as well. To summarize, the regressor coefficients co(n), 
c\(n), ..., Cy_,(”) are obtained according to the following equations: 


= Me n K)bm,n (k) 


13.26 
Di MBB, (k) ene) 


Cm (1) = 
form = 0, 1,..., N — 1. 


Equations (13.23), (13.24), and (13.26), although fundamental in providing a clear 
understanding of the underlying principles in the development of the least-squares lattice 
algorithm, cannot be used for the computation of the lattice coefficients in an adaptive 
application, as their computational complexity grows with the number of data samples, 
n. As in the case of the standard RLS algorithm, the problem is solved by finding a set 
of equations that update the filter coefficients in a recursive manner. This is the subject 
of the next section. 


13.4 RLSL Algorithm 


In this section, we go through a systematic step-by-step procedure to develop the recur- 
sions necessary for the derivation of the RLSL algorithm. The development of the RLSL 
algorithm involves a large number of variables, compared to any of the algorithms that 
we have derived/discussed thus far in this book. Because of this, it is often difficult for 
a novice to the topic to follow these equations. Thus, choosing the right set of notations 
that reduce this burden is crucial to the development of a readable material on this topic. 
Bearing this in mind, our discussion on the RLSL algorithm begins with an introduction 
to the notations and some preliminaries. The derivations will be followed thereafter. 


13.4.1 Notations and Preliminaries 


In the past few sections, we introduced a number of notations for formulating the least- 
squares solutions in the cases of forward and backward transversal predictors as well as 
lattice joint process estimator. Here, we introduce some more notations and also some 
new definitions that are necessary for the derivations that follow. 
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Prewindowing of Input Data: Throughout the discussion in the remainder of this chapter, 
we assume that the samples of input signal, x(k), are all-zero for values of k < 0. This 
assumption on input signal is known as prewindowing. In this book, we do not consider 
other variations of the fast RLS algorithms that are based on other types of windowing 
methods (Honig and Messerschmitt, 1984; Alexander, 1986a,b). 


A Priori and A Posteriori Estimation Errors: We noted that the subscript n in the 
sequences fn nlk), bm n(k), and em n(k) denotes that they are a posteriori estimation 
errors. The term a posteriori signifies that the errors are obtained using the filter (predictor) 
coefficients, which have been optimized using the past as well as the present samples of 
the input and desired output, that is, x(k) and d(k), fork = 1, 2,...,n. In other words, the 
a posteriori estimation errors are obtained when the lattice coefficients Kt (n)’s, aa (n)’s, 
and c,,()’s, as given by Eqs. (13.23), (13.24), and (13.26), are chosen. In contrast, if 
we compute these estimation errors using the last values of the joint process estimator 
coefficients, that is, kf, (n — 1)’s, ic? (n — 1)’s, and c,,(n — 1)’s, then the resulting errors 
are known as the a priori. We use the notations fn n—1(k), bm n-1(k), and em »—1(k) 
(with the subscripts n — | signifying the use of the last values of the lattice coefficients, 


kf (n — 1)’s, k} (n — 1)’s, and c,,(n — 1)’s) to refer to the a priori estimation errors. 


Conversion Factor, y„(n): A key result to the development of the fast RLS algorithms 
is the following relationship: 


Em n-i) — bm, n-1 (0) = Finan” F 1) 


= = 13.27 
eman) bmn) fannt D (13.27) 


Note that in Eq. (13.27) the numerators are the a priori estimation errors and the denom- 
inators are the a posteriori estimation errors. To appreciate this relationship, we note that 
the tap-input vector to the length m joint process estimator at time n is x„(n) = [x(n) 
x(n — 1) --- x(n — m + 1)]". This is also the tap-input vector to the mth-order backward 
predictor at time n and that of the forward predictor at time n + 1. Noting this, it appears 
that, in general, the ratio of the a priori and a posteriori estimation errors depends only 
on the tap-input vector of the filter (predictor or joint process estimator). This ratio, which 
will be discussed in detail later, is called conversion factor, denoted as y,, (1). 


Least-Squares Error Sums, t$ (n), off (n), and £2? (n): We recall that ¢°¢(n), cf (n), and 
¢>>(n), as defined in Eqs. (13.22), (13.20), and (13.21), respectively, refer to the least- 
squares error sums of the joint process estimator of length m, and forward and backward 
predictors of order m. Note also that all these are based on the a posteriori estimation 


errors. 


Cross-correlations, P (n) and ghe (n): The summation in the numerator of Eq. (13.23) 


(and also Eq. (13.24)) may be defined as the (deterministic) cross-correlation between the 
forward and backward prediction errors, fm—1 „n(k) and bm-1,n-1(k — 1). Similarly, the 
summation in the numerator of Eq. (13.26) may be called the cross-correlation between 


the backward prediction error, b, ,(k), and the joint process estimation error, e,, , (k). 


m,n m,n ( 
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Accordingly, we define 


Ge (2) = Yo AF bn nk D) Fn vB) (13.28) 
k=1 
and n 
be = YO Aen n E)n nE) (13.29) 
k=1 


To follow the same terminology, we may refer to the least-squares sums cf (n), ¿H (n), 
and ¢£°(n) as autocorrelations. 

Using Eqs. (13.20), (13.21), (13.22), (13.28), and (13.29), the set of Eqs. (13.23), 
(13.24), and (13.26) are written as 


fo 
i En-1 0) 
saem 13. 
Ki (2) nD (13.30) 
fo 
e} (n) = u (13.31) 
nC) 
and 5 
ews m (n) (13.32) 
ý thb (n) l 


We later develop a set of equations for recursive updating of the auto- and cross- 
correlations that were just defined. The updated auto- and cross-correlation will then be 
substituted into Eqs. (13.30)—(13.32) for the computation of the lattice coefficients at every 
iteration. 


Augmented Normal Equations for Forward and Backward Prediction: Using the definition 
(13.2), W,,4,() may be extended as 


[Om whi (n) 
Yma =| Fay ¥ 1) (13.33) 


m 


where yf, (n) is defined by Eq. (13.3) and 


wn) = oar) (13.34) 
k=1 
Using Eqs. (13.33) and (13.34), Eqs. (13.1) and (13.8) may be combined together to 
obtain 


Sf 
Y n41 May, (n) = al (13.35) 


m 


where 


anim) =| i | (13.36) 


~an (n) 
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0,, denotes the m-by-1 zero vector, and a,,(m) is the tap-weight vector of the transversal 
forward predictor, optimized in the least-squares sense. Equation (13.35) is known as the 
augmented normal equation for forward predictor of order m. 

The matrix W,,,,;(”) may also be extended as 


[En wn) 
Ving 0) = Paw ye a (13.37) 


where y? (n) is defined by Eq. (13.12) and 


n 


ym (n) = > uk x(k — m) (13.38) 


k=1 


Using Eqs. (13.37) and (13.38), Eqs. (13.11) and (13.17) may be combined together to 
obtain 


¥ ci DE, (n) = Le | (13.39) 


where 


g,,(n) = E a (13.40) 


and g„(n) is the tap-weight vector of the transversal backward predictor, optimized in 
the least-squares sense. Equation (13.39) is known as the augmented normal equation for 
backward predictor of order m. 


13.4.2 Update Recursion for the Least-Squares Error Sums 


Consider an N-tap transversal filter with the tap-input vector x(k) = [x(k) x(k— 1) --- 
x(k — N + 1)]' and desired output d(k). From Chapter 12, we recall that the least-squares 
error sum of the filter at time n is ! 


Emin (2) = MAMAN) — 0" (n)w(n) (13.41) 
where W(n) is the optimized tap-weight vector of the filter, d(n) = [d(1) d(2) --- d(n)]', 


6(n) = > nk d(k)x(k) 


k=1 


and A(n) is the diagonal matrix consisting of the powers of the forgetting factor, A, as 
defined in Eq. (12.35). Substituting Eqs. (12.42), (12.52), and 


d'(n)A(n)d(n) = d?(n) + Ad" (n — 1)A(n — 1)d(n — 1) 


l! It may be noted that Eq. (13.41) is similar to Eq. (12.16) with the forgetting factor, à, included in the results. 
In addition, to be consistent with the rest of our notations in this chapter, the subscript A has been dropped from 
vectors and matrices. 
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in Eq. (13.41) and rearranging, we obtain 


tminn) = AAT (n — 1)A(n — 1Dd(n — 1) — 0T (n — 1)®(n — 1)) 
+d(n)(d(n) — T(n — Dx% (n)) — x" (n)k(n)e, (nd (n) 
—10T(n — Dk(n)e, (n) 
= ASmin(n — 1) + d(n)e,_1(n) 
—(x(n)d(n) +280 — D) "(nen 0) (13.42) 


where we have noted that d(n) — W'(n — 1)x(n) is the a priori estimation error e,_ ; (7). 
Furthermore, from Eq. (12.42), we note that x(n)d(n) + 240(n — 1) = 0 (n). Using this 
result and Eq. (12.48), we obtain 


(x(n)d(n) + 20(n — 1))"k(n) = 0" (n) 7! (n)x(n) 
= W!(n)x(n) (13.43) 


where we have noted that 6'(n)¥~!(n) = (Y7! (n)0(n))* = T(n), as W~!(n) is sym- 
metrical. Substituting Eq. (13.43) in Eq. (13.42) and rearranging, we obtain 


Smin(2) = ASmin( — 1) + e, (ne, (n) (13.44) 


where e,(n) = d(n) — w!(n)x(n) is the a posteriori estimation error. Thus, to update 
Smin(t — 1), we only need to know the a priori and a posteriori estimation errors at 
instant n, that is, e„_; (n) and e,(n), respectively. 

Recursion Eq. (13.44) can readily be applied to update the least-squares error sums of 
the forward and backward predictors as well as the joint process estimator. The results 
are 


cf an) SE n D 4+ finn fnn 0) (13.45) 
CPP n) = Ah (n = 1) + bpn bpn- 0) (13.46) 
Cn (n) =A (n— 1) + em Cr) (13.47) 


where fm n-10), fmn), Bm n10), bm nn), em n-10), and ep „(n), as defined before, 
are the associated a priori and a posteriori estimation errors. 

We note that the above update equations involve the use of both the a priori and a 
posteriori estimation errors. We next see that with the aid of the conversion factor, y,, (7), 
the above recursions may only be written in terms of either the a priori or a posteriori 
estimation errors. 


13.4.3 Conversion Factor 


Using recursion Eq. (12.52), the a posteriori estimation error e„(n) of a transversal filter 
with the least-squares optimized tap-weight vector w(n), tap-input vector x(n), and desired 
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output d(n) may be expanded as 
e,(n) = d(n) — W" (n)x(n) 
= d(n) — (W(n — 1) + k(nJe,_(n))™x(n) 
= d(n) — W'(n — I)x(n) — k"(n)x(ne,_1(n) 
= e,_;(n) —k"(n)x(n)e,_1(n) 
= (1 —k"(n)x(n))e,_1(n) (13.48) 


where e,,_,(n) = d(n) T(n — 1)x(n) is the a priori estimation error. We note that the 
a priori and a posteriori estimation errors, e„_;(n) and e, (m), respectively, are related by 
the factor 1 — k'(n)x(n). This is called conversion factor, as mentioned in Section 13.4.1, 
and is denoted by y(n). Thus, 


y(n) = 1—k"(n)x(n) (13.49) 
Substituting Eq. (12.48) in Eq. (13.49), we also obtain 
y(n) = 1—x"(n)W~! (n)x(n) (13.50) 


An interesting interpretation of y(n) whose study is left for the reader in Problem P13.8 
reveals that y(n) is a positive quantity less than or equal to 1. Another interesting property 
of y(n) is seen by noting that for a given forgetting factor, A, ¥(n) depends only on the 
input samples to the filter. Accordingly, y(n) can be found once the observed tap-input 
vectors to the filter are known. The following cases are then identified: 


1. In an mth-order forward predictor, with the observed tap-input vectors x, (0), x,,(1), 
..., Xp (n — 1) (with x,, (0) = 0, because of prewindowing the input data), the conver- 
sion factor is recognized as 


Ya (n= 1) = 1 =x! (n — 1)! — 1)x„ (n — 1) 
= 1— k! (n — 1)x„(n — 1) (13.51) 


where k„(n — 1) is the gain vector of the forward predictor, as was identified before 
(Section 13.1). Accordingly, the a priori and a posteriori estimation errors fm ,—1(”) 
and fn,n(n) of the forward predictor are related according to the following equation: 


Frnn®) = Vn — 1) fm, n10) (13.52) 
2. Similarly, in an mth-order backward predictor, with the observed tap-input vectors 
Xm (1), X (2), -- -» Xm (n), the conversion factor is recognized as 
Yn (1) = 1 — Xp, (MY (nxp (2) 
= 1 — k! (n)x,,(n) (13.53) 


Accordingly, the a priori and a posteriori estimation errors b,, „—ı (n) and b,, „(n) of 


the backward predictor are related according to the following equation: 


bm nn) = Vm) bm n- 0) (13.54) 


m,n 
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3. The observed tap-input vectors to an m-tap joint process estimator are x,, (1), X,, (2), . 
X,,(”). Since these are similar to the tap-input vectors to the mth-order backward pes 
dictor, the conversion factor of the m-tap joint process estimator is also y,,(”), given by 
Eq. (13.53). Accordingly, the a priori and a posteriori estimation errors e,n ,_,(m) and 


€m.n (2) Of the joint process estimator are related according to the following equation: 


m,n (n) = Ym (Mem n10) (13.55) 


13.4.4 Update Equation for Conversion Factor 


First, we show that 


Wn) = Kia (n) 0, 


m 


"+ oe BG Fb Gy Bm Bm") (13.56) 


To this end, we multiply the right-hand side of Eq. (13.56) by W,,,,;() and show that 
the result is the (m + 1)-by-(m + 1) identity matrix. The two separate expressions arising 
from this multiplication are 


A = Ypa (n) Ka (n) | (13.57) 


and i 
= FE cay Pmt Bn En”) (13.58) 
Substituting Eq. (13.37) in Eq. (13.57), we obtain 


i= Pax Vn ella „ (n) | 
ra wn] [0,0 


I, 0, 
= yro; In) 0 | (13.59) 


where I, is the m-by-m identity matrix. Furthermore, recalling that Y, l! (n) is a symmetric 


matrix and using Eq. (13.11), we obtain 


m 


Wn MYR (n) = (Wi, (ayo (n))" = ga (7) (13.60) 
Substituting Eq. (13.60) in Eq. (13.59), we obtain 
I 0 
A= m m 13.61 
E? ll coe 
In addition, substituting Eq. (13.39) in Eq. (13.58), we obtain 
0---0 0 
0 T Po PS 
B=| |g =] it! (13.62) 
l 0...00 
gan) 1 
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Adding the results of Eqs. (13.61 ) and (13.62), we obtain 
A F B= Lit 


This completes the proof of Eq. (13.56). 
Premultiplying and postmultiplying Eq. (13.56) by x 
and noting that 


T 


m+1 (2) and X„41(7), respectively, 


X41 (2) -| Xn (n) | and k,,(n) = Y7! (nx, (n) 


x(n —m 
we obtain ap . 
XPM) Ky 410) = xh ()k,, (0) + Ee e (13.63) 
Using Eq. (13.40), we obtain 
Bn O)Xm41 (2) = x(n — m) — gp, )Xm (n) 
= bmn (n) (13.64) 


Finally, substituting Eq. (13.64) in Eq. (13.63), subtracting both sides of the result from 
1, and recalling Eq. (13.53), we obtain 

bi, n(n) 
chh (n) 


Ym+1 2) == Ym (n) (13.65) 
13.4.5 Update Equation for Cross-Correlations 


The recursions that are remaining to complete the derivation of the RLSL algorithm are 
the update equations for cross-correlations ¿® (n) and che (n). 
Recall the RLS recursion for the mth-order forward transversal predictor 


a, (2) = ap (n — 1) +k, (n — 1) fin. n_-1 0) (13.66) 


where k„(n— 1) and fa n-1(7), as defined earlier, are the gain vector and a priori 
estimation error of the forward predictor, respectively. The samples of the a posteriori 


estimation error of the forward predictor, for k = 1,2,...,n, are given by 
Finn (K) = x(k) = An OX (k = 1) (13.67) 
Substituting Eq. (13.66) in Eq. (13.67) and rearranging, we obtain 
fmn = fnn) = k! (n — 1)x,, (k — 1) finn-1@ (13.68) 
where 
Fin nk) = xk) = an (8 = Dp (K 1) (13.69) 
for k = 1, 2,...,m are samples of the a priori forward prediction error. 


In addition, recall the RLS recursion for the mth-order backward predictor 


Sn (n) = Em (a= le Ky, (1) Dy n10) (13.70) 
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where k,, (7) and b,, , _,(”), as defined earlier, are the gain vector and a priori estimation 
error of the backward predictor, respectively. The samples of the a posteriori estimation 
error of the backward predictor, for k = 0,1,...,n, are given by 


Din.n(k) = x(k — m) — 8), (1)X p(k) (13.71) 


Substituting Eq. (13.70) in Eq. (13.71) and rearranging, we obtain 


Dinan (k) = Din n—1 tk) B kn, (1)X» (k)bm n-1 (0) (13.72) 
where 
bm n-1 K) = x(k — m) — gp — IX» (K) (13.73) 
for k = 1,2,...,n, are samples of the a priori backward prediction error. 


Next, substituting Eqs. (13.68) and (13.72) in Eq. (13.28) and expanding, we obtain 


cP (n) = DOA fnn- bm nalk T 1) 


k=1 


n 
-ki, (n = 1) bn n20 a 1) Du ero (k — 1) 
k=1 


Kn — 1) fnn- 00) >) Abn nlk — Dp (k — 1) 
k=1 


+k! (n —1)W,, (2 — Dk, (n — 1) fm n-10)bm n20- 1) (13.74 


where, to obtain the last term, we have noted that kT (n — 1)x,,(n — 1) =x} (n — Dk 
(n — 1) and also 


m 


n—1 


n 
yan x(k — xh = D = >a an O) = Ya (n 1) 
k=1 k=1 


as X„ (0) = 0 because of prewindowing. 
We treat the four terms on the right-hand side of Eq. (13.74) separately: 


e First term: We note that 


n 


> aa TA (K) Din n—2(k ~~ 1) 


k=1 
n—1 


=) 5 ATIE Fani (3) nalk _ 1) F Jm, n-1 bm, n20 ~ 1) 
k=1 


=I n= 1) + fa Oe eG (13.75) 
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e Second term: We first note that 


n 
>.: AE Ser (K)Xm (k i 1) 
k=1 

n—1 


=À DOAI Ox, (k ~~ 1) + Ím n-1 Xn (n -a 1) 


k=1 
= Ím n1 Xn (0 =1) 


where the last equality follows from Eq. (13.7), with n replaced by n — 1. Using this 
result, we obtain 


n 
KF, n — Dbm n20 — 1) a" fg n1 OX mK — D 
k=1 


= k! (n = 1)X,(2 — 1) fn n1 Mm n20 — 1) (13.76) 


e Third term: Using the change of variable / = k — 1, we obtain 


n n—1 


DOA bn nlk = 1)x,, (k = 1) = ae (1) 
hel 1=0 
n—1 
— S a (1) 
l=1 
n—2 


Shh DED 


l=1 
+bm n2 (n — 1)X„ (n — 1) 
= bm, n-20 = 1)X%„ (n = 1) 


where we have used x,, (0) = 0 (because of prewindowing) in the second step and Eq. 
(13.16) with n replaced by n — 2 for the last step. Using this result, we obtain 


n 
ki — 1) fn n10) > A Bg 1K — DX pq ( D) 
k=1 

= kj (2 DX (2 = D) finn} Pn p20 1) (13.77) 

e Fourth Term: Using Eq. (13.10), we obtain 

Yp — 1k, (2 — 1) = Xp (n — 1) 
Thus, 
ki (n — DW, 2 Dkp = 1) fin n1 Pn n_2 2 1) 

= kp (n — DXm (n — 1) finn OPnn2@ — 1) (13.78) 
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Substituting Eqs. (13.75), (13.76), (13.77), and (13.78) in Eq. (13.74) and rearranging, 
we obtain 


Gh (A) = Gm (n= 1) +L = KR (t= DX) D) finn bmn n1) (13.79) 


Next, noting that 1 — K} (n — Dx,,(n — 1) = Ym (n — 1) and Yp n — 1)bm n-2(n — 1) = 
bm n-1(7 — 1), according to Eqs. (13.53) and (13.54), respectively, Eq. (13.79) can be 
simplified as 

ohn) = ach n — D+ finn Bn nin- 1) (13.80) 


Following a similar line of derivations, we also obtain 
CP! (n) = AGP? (n — 1) + em n—1 (1) Bn n—1 7) (13.81) 


We have now developed all the basic equations/recursions necessary for implementation 
of the RLSL algorithms. 


13.4.6 RLSL Algorithm Using A Posteriori Errors 


Table 13.1 lists a possible implementation of the RLSL algorithm that uses the a posteriori 
estimation errors. For every iteration, the algorithm begins with the initial values of 
fon), Pon), €o n(n), and yo(n), as inputs to the first stage and proceeds with updating 
the successive stages of the lattice in a for loop. The operations in this loop may be 
divided into those related to forward and backward predictions and the operations related 
to the filtering. In the prediction part, the recursive Eqs. (13.45), (13.46), and (13.80) 
are used to update off (n), gob (n), and cP (n), respectively. Here, we have also used Eqs. 
(13.52) and (13.54) to write the recursions in terms of only the a posteriori estimation 
errors, fm n(n) and b,, ,(n). The results of these recursions are then used to calculate 
the PARCOR coefficients kf) and K (n) according to Eqs. (13.30) and (13.31), 
respectively. This follows with the order-update equations for the computation of the a 
posteriori estimation errors of the forward and backward predictors. These follow from 
Figure 13.3 — see also Chapter 11. The filtering is done in a similar way using the recursion 
Eq. (13.47) and Eqs. (13.32) and (13.55). Finally, the conversion factor y„(n) is updated 
according to recursion Eq. (13.65). 

Theoretically, the auto- and cross-correlations off (n), gb (n) should be initialized to 
zero. However, as such initialization results in division by zeros during the first few 
iterations of the algorithm, cf (0) and GA (0), for m = 0, 1,..., N — 1, are initialized to 
a small positive number, ô, to prevent this numerical difficulties. The cross-correlations 
gP (0) and gre (0) are initialized to the value of zero. 


13.4.7 RLSL Algorithm with Error Feedback 


The RLSL algorithm given in Table 13.1 uses the auto- and cross-correlations of the input 
signals to the successive stages of lattice to calculate the coefficients kf (n), KË (n), and 
Cm(n), according to Eqs. (13.30), (13.31), and (13.32), respectively. Alternatively, we can 
develop a set of recursive equations for updating the coefficients «f, (n), <} (n), and c,, (1). 
This leads to an alternative implementation of the RLSL algorithm that has been found to 
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Table 13.1 RLSL algorithm using the a posteriori estimation errors. 


Input: Latest sample of input, x(n), 
Past values of 
the backward a posteriori estimation errors, b,, ,_ (1 — 1), 
the auto- and cross-correlations, cf (n — 1), gob (n— 1), cf? (n — 1), and re (n— 1), 
the conversion factors, y„(n — 1), 
form=0,1,...,N—1. 
Output: The updated values of 
the backward a posteriori estimation errors, b, „(n) 
the auto- and cross-correlations, of (n), cbb (n), leM (n) and coe (n), 
the conversion factors, y,, (7), 
form=0,1,...,N—1. 
The lattice coefficients are also available at the end of each iteration. 


fon) = bo, (n) = x(n) 
eon (n) = d (n) 

wn) = 1 

form=Oto N-1 


an 
Fn) = ach —1) + a 


Ym (n—1) 
2. 

bb bb Oman) 

Sm (n) = An’ (n — 1) 4 Ym (n) 


fb finn bm n—1—1) 
Em (n) = ach (n— 1)4 oat 


t m= h (n) 


(P (n=) 
fb 
b Sm (n) 
ni = 
mt) = Fo 
f 
Sint in™ = fan (n) T Kayi bmn- _ 1) 
b 

Daria (n) = Diy nn = 1) = kad Sna) 


b L fi €m,n (n)bm,n (n) 
be(n) = aghe (n — 1) 4 


m Ym (n ) 


K 


one) 
op? (n) 


Em+1,n (n) = Em,n (n) z Emban (n) 


Cnn) = 


b2, n(n) 
Ym+1 0) = Yn (n) ~ Cn) 


end 


be less sensitive to numerical errors as compared with the algorithm of Table 13.1 (Ling, 
1993). 

Table 13.2 gives a summary of this alternative implementation of the RLSL algorithm. 
We note that here all the errors are the a priori ones, while in Table 13.1, all the 
equations are in terms of the a posteriori errors. In addition, the update equations of 
the cross-correlations gP (n) and ¢°¢(n) have been deleted in Table 13.2, as they are no 


m 
longer required. Instead, there are three recursions for time-updating the coefficients of 
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Table 13.2 RLSL algorithm using the a priori estimation errors with error feedback. 


Input: Latest sample of input, x(n), 
Past values of 
the backward a priori estimation errors, Din n-20 — 1), 
the autocorrelations, cf (n — 1), og (n— 1), 
the lattice coefficients xf (n — 1), KP ,,(n — 1) and c,,(n — 1), 
the conversion factors, y,,(n — 1), 
for m = 0, 1,..., N — 1. 
Output: The updated values of 
the backward a priori estimation errors, b, „1 0) 
the autocorrelations, cf (n), g2 (n), 
the lattice coefficients KÉ, 41@), Kp +10), and c„(n), 
the conversion factors, y,, (n), 
for m = 0, 1,..., N — 1. 


Ío,n-1 0) = bo,n-1 0) = x(n) 
eon-1 (7) = d (n) 

Vo(n) a | 

for m =0to N—1 


cf (n) =r08 (n — 1) + y,(n EN AC) 

Pb (n) = ACHP (n — 1) + Yq (bZ, 1M) 

Sasina = fnn = Kyi Domna = 1) 
bnti) = Dm nn — 1) — KP — 1) fnna A) 


f f Yn (n=l) bm. n-2(n-1) 
Kn) = Kny 1) 5 E Smin- 


cre (n—1) 
b b Ym (n—1) fm,n-1 (2) 
Km41 2) = Km 1) tf (n) m4ln-10) 


Emt n-1 0) = €m.n—1 2) i Cmn = Dba n- 


Ym (n)bm n1 0) 
Cmn) = Cp (n — 1) any em+1n-1 0) 


Vin Dbh n10) 
g (n) 


Ym+1 0) = Yn (1) 


end 


the lattice. Next, we explain the derivation of one of these recursions as an example. 
The other two can be derived by following the same line of derivation. 
Recall that 
fb 
kiyn) = = (13.82) 
m+1 thb (n = 1) . 


Substituting Eq. (13.80) in Eq. (13.82), we get 


achn) finn—1M)Pnn1@ — 1) 
gbb (n = 1) cP (n =a 1) 


kt (0) = (13.83) 
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However, 


iol: f=) an- 2) 
chin —1) ohm — 2) tn- 1) 


m 


atm (n — 2) 

gf =| m 

me GD 
— gf (n 1) PPn = 1) z: bm n-1 (n m Dbm n20 = 1) 

an cn = 1) 
f 
Kin (a ~ Dbm »—1 ~~ Db nM =I) 

=f (n— 1) + T (13.84) 


where we have used Eq. (13.46) to replace A¢$’ (n — 2) by ¿$? (n — 1) — bm. n10 — 


m 


1)bm n-2(7 — 1). Substituting Eq. (13.84) in Eq. (13.83) and rearranging, we obtain 


f bini T: 1) f 
Km+1 (7) = Km41 0 1) + t(n = D (Sm n10) E Kimi = Dbm, n-2 (0 = 1)) 
Dn n— (n >= DSi n— (n) 
= nya — 1) + 2 mae a (13.85) 


Finally, using Eq. (13.54) to convert the a posteriori estimation error b, ,_;(” — 1) to its 
equivalent a priori estimation error b,, ,_3(n — 1), in Eq. (13.85), we get 


Ym (n— Dbm n20 = 1) 
eP a=) 
which is the recursion used in Table 13.2, for adaptation of Kf +10). Following the same 


line of derivation, we can also obtain the recursions associated with the adaptation of 
kÈ +10) and c„ (n). This is left to the reader as exercises. 


kayi 0) = Khay 1) + faina (13.86) 


13.5 FTRLS Algorithm 


The FTF or FTRLS algorithm is another alternative numerical technique for solving the 
least-squares problem. The main advantage of the FTRLS algorithm is its reduced com- 
putational complexity as compared with other available solutions, such as the standard 
RLS and RLSL algorithms. Table 13.3 summarizes the number of operations (additions, 
multiplications, and divisions) required in each iteration of the standard RLS algorithm 
(Table 12.2), the two versions of RLSL algorithm presented in Tables 13.1 and 13.2, 
and also the two versions of FTRLS algorithm that will be discussed in this section, as 
an indication of their computational complexity.” We note that as the filter length, N, 
increases, the standard RLS becomes a rather expensive algorithm, as its computational 
complexity grows proportional to the square of the filter length. On the other hand, the 


? We note that the number of operations, in general, may not be a fair measure in comparing various algorithms. 
A fair comparison would only be possible if the platform over which the algorithms are implemented is known a 
priori. For example, in hardware implementation, the modular structure of the RLSL may be very beneficial when 
a pipe-line structure is considered (Ling, 1993). 
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Table 13.3 Computational complexity of various RLS algorithm. 


Algorithm No. of +, x, and + (added) 


RLS (Table 12.2) 3.5N? 
RLSL (Table 13.1) 28N 
RLSL (Table 13.2) 31N 
FTRLS (Table 13.4) 14N 
FTRLS (stabilized) 18N 


computational complexities of RLSL and FTRLS algorithms grow only linearly with filter 
length. In addition, we find that the FTRLS algorithm has only about half the complex- 
ity of the RLSL algorithm. However, unfortunately, such a significant reduction in the 
complexity of the FTRLS algorithm does not come for free. Computer simulations and 
also theoretical studies have shown that the FTRLS algorithms are, in general, highly 
sensitive to roundoff error accumulation. Precautions have to be taken to deal with this 
problem to prevent the algorithm from becoming unstable. It is generally suggested that 
the algorithms should be reinitialized once a sign of instability is observed (Eleftheriou 
and Falconer, 1987; Cioffi and Kailath, 1984). To reduce the chance of instability in the 
FTRLS algorithm, a new version that is more robust against roundoff error accumula- 
tion has been proposed by Slock and Kailath (1988, 1991). This is called stabilized fast 
transversal recursive least-squares (SFTRLS) algorithm. However, studies show that even 
the SFTRLS algorithm has some limitations in the sense that it becomes unstable when 
the forgetting factor, à, is not close enough to 1. This definitely limits the applicability 
of the FTRLS algorithm in cases where smaller values of à should be used to achieve 
fast tracking (see Chapter 14). 


13.5.1 Derivation of the FTRLS Algorithm 


The FTRLS algorithm, basically, takes advantage of the interrelationships that exist 
between the forward and backward predictors as well as the joint process estimator when 
they share the same set of input samples. In particular, in the development of the RLSL 
algorithm in Section 13.4, we found that the forward and backward predictors and also the 
joint process estimator share the same conversion factor and gain vector. These properties 
led to a number of order- and time-update equations that were eventually put together to 
obtain the RLSL algorithm. In the RLSL algorithm, the problem of prediction and filtering 
(joint process estimation) is solved for orders of 1 to N, simultaneously. In cases where 
the goal is to solve the problem only for a filter of length N, this solution clearly has 
many redundant elements, which may unnecessarily complicate the solution. Accordingly, 
a set of equations that are limited to order N predictors and also to a length N filter (joint 
process estimator) may give a more efficient solution. This is the main essence of the 
FTRLS algorithm, when it is viewed as an improvement to the RLSL algorithm. 

To have a clear treatment of the FTRLS algorithm, we proceed with the derivations of 
the necessary recursions separated into three sections, namely forward prediction, back- 
ward prediction, and filtering. 
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Forward Prediction 


Consider an Nth-order forward transversal predictor with tap-weight vector a,(n) and 
tap-input vector xy(k — 1) = [x(k — 1) x(k— 2) --- x(K—N)]', for k=1,2,...,n. 
The RLS recursion for adaptive adjustment of ay (n) is 


ay(n) =ay(n— 1) + ky (a — 1) fyn—-1™) (13.87) 


where ky (7 — 1) is the gain vector of the adaptation as defined in Eq. (13.10), and 
f.n—1@) is the a priori estimation error of the forward predictor. 
Let us define the normalized gain vector 


ky(n) 
yy (n) 


where y,() is the conversion factor as defined before. Substituting Eqs. (13.88) and 
(13.52) in Eq. (13.87), we get 


ky(n) = 


(13.88) 


ay(n) = ay(n — 1) +ky(n— Dyy(— 1) fyn- (n) 
=ay(n—-D+ky(n—- 1) fyn (n) (13.89) 


where ÍN n) is the a posteriori estimation error of the forward predictor. Furthermore, 
using the definition (13.36), we may rewrite Eq. (13.89) as 


x 3 0 
ay(n) = ay(n — 1) — Ee O A fun@ (13.90) 
Next, we note that 
A eas MO 01 1 2 og 
Yyy @™ = p Y7! ne ‘| + ta (13.91) 


This identity, which appears similar to Eq. (13.56), can also be proved in the same way 
as Eq. (13.56). This is left to the reader as an exercise. Postmultiplying Eq. (13.91) by 
Xy+1(1), recalling Eqs. (13.36), (13.5), (13.19), and (13.88), and noting that 


Xy+ 0) = Ea il ; 
we obtain 


0 


Yysi@)Ky 41) = yy(n — 1) Ee a »| i ÍN), 


a Ay (n) (13.92) 
N n 


Substituting Eq. (13.90) in Eq. (13.92) and rearranging, we obtain 


- Sin) 0 
Yny Ky (1) = (ro =S= fm) Ee — d 


ÍN nn) 
ch (n) 


y(n — 1) (13.93) 
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On the other hand, post- and premultiplying Eq. (13.91) by Xy+; (7) and x 417), respec- 
tively, subtracting both sides of the result from unity, and recalling Eq. (13.53), we obtain 


Fn) 
cin) 
Substituting Eq. (13.94) in Eq. (13.93) and dividing both sides of the result by yy41 (7), 


we get ÍN n) 
7 0 Nn a 
k =e 
N41 (n) Ee = T TO 


Y+) = Yy(n — 1)— (13.94) 


1) (13.95) 


Moreover, combining Eqs. (13.94) and (13.45), it is straightforward to show that (Problem 
P13.17) 


Yny if (n) = Ayy(n — D (n — 1) (13.96) 
Finally, substituting Eq. (13.96) in Eq. (13.95) and using Eq. (13.52), we obtain 


- f 0 adoa 
k= |i | i a - ee 


This recursion gives a time as well as order-update of the normalized gain vector. Next, 
we develop another recursion that keeps the time index of the normalized gain vector 
fixed at n, but reduces its length from N + 1 to N. This also leads to a time update of 
the tap-weight vector of the backward predictor. 


Backward Prediction 


Consider Eq. (13.56) with m = N. Then, postmultiplying it by Xy; (7) and recalling Eqs. 
(13.40) and (13.88), we obtain 


by n(n). 
ay BN (13.98) 


Yn) Kyi) = yy) al n 


Equating the last elements of the vectors on both sides of Eq. (13.98) and rearranging, 
we obtain 
by n(n) 


Yvy en’ (Nn) 


kyy, n410) = (13.99) 


where ky +1,ẹọ+1(7) denotes the last element of ky 4,(). On the other hand, combining 
Eqs. (13.46) and (13.65), and replacing m by N, it is straightforward to show that (Problem 
P13.18) 


Yny ER (n) = Ayy (ner? (n — 1) (13.100) 


Substituting Eq. (13.100) in Eq. (13.99), recalling Eq. (13.54), and rearranging the result, 
we get 
by nM) = AER? (n — DKy ay ya) (13.101) 
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Furthermore, solving Eq. (13.100) for y,(n) and using Eq. (13.46), we obtain 


cH (n) 
ach (n= 1) 


7 (Se =i 


cP (n) 


yn (n) Yn+ı (n) 


=i 
) Yn+1 0) 


=j 
by nn)by n10) 
= ( N E) : ) Yna (n) (13.102) 


Substituting Eq. (13.99) in Eq. (13.102), we get 


y(n) = (l = by n1 Yny My 41,.y41@) YN 0) (13.103) 


Moreover, we note that the update recursion Eq. (13.18) (with m = N) may be rear- 
ranged as 


By (n) = Sy(n — 1) — ka by n(n) (13.104) 


Substituting Eq. (13.104) in Eq. (13.98) and rearranging, we obtain 


= b2 a ) = 
Yn MK y 41) = (ro _ Na ) ka 


cP? (n) 


byn n). 
—1 13.1 
oe ai 


Finally, substituting Eq. (13.65) with m = N in Eq. (13.105), dividing both sides of the 
result by yy; (7), and using Eq. (13.99), we obtain 


ka = ky") — Kyaw Minn —1) (13.106) 


With this recursion, we recover back the updated value of the gain vector in the right 
order, N. We thus can proceed with the next iteration of predictions and also use k n(n) 
for adaptation of the tap-weight vector, W, (n), of an adaptive filter with tap-input vector 
X,y(n), as explained below. 


Filtering 


Having obtained the normalized gain vector k y(n), the following equations may be used 
for adaptation of the tap-weight vector, W, (n), of an adaptive filter with tap-input vector 
Xy(n). 

We first obtain the a priori estimation error 


ey n—-1() = d(n) — Wy (n — 1)xy (n) (13.107) 
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Then, we calculate the corresponding a posteriori estimation error 


eN n(n) = yy (Men n- (n) (13.108) 


Finally, the update of the tap-weight vector of the adaptive filter is done according to the 
recursion E 
Wy (1) = Wy(n — 1) + ky (ney n(n) (13.109) 


We note that Eq. (13.109) is the same as the recursion Eq. (12.52). The only difference 
is that Eq. (13.109) is written in terms of the normalized gain vector Ky (n) and, as a 
result, the a priori estimation error éy,,_,(”) is replaced by the a posteriori estimation 
error ey ,(”). 


13.5.2 Summary of the FTRLS Algorithm 


In Table 13.4, we present a summary of the FTRLS algorithm by collecting together the 
relevant equations from Section 13.4 and some of the new results that were developed in 
this section. 

As mentioned before, the FTRLS algorithm may experience numerical instability. To 
deal with this problem, it has been noted that the sign of the expression 


pn) =1- by, Myvi Mky si vy @ (13.110) 


is a good indication of the state of the algorithm with regard to its numerical instabil- 
ity. From Eq. (13.103), we note that (n) = yy41(n)/yy (n) and this has to be always 
positive, as the conversion factors, yy (n) and y),) (7), are nonnegative quantities (Prob- 
lem P13.8). However, studies have shown that the FTRLS algorithm has some unstable 
modes that are not excited when infinite precision is assumed for arithmetic. Under finite 
precision arithmetics, these unstable modes receive some excitation, which will lead to 
some misbehavior of the algorithm and eventually resulting in its divergence. In partic- 
ular, it has been noted that the quantity 6(n) becomes negative just before divergence 
of the algorithm occurs (Cioffi and Kailath, 1984). For this reason, (n) is called rescue 
variable, and it is suggested that once a negative value of 6(n) is observed, the normal 
execution of the FTRLS algorithm must be stopped and it should be restarted. In that 
case, the latest values of the filter coefficients may be used for a soft-reinitialization of 
the algorithm (see Cioffi and Kailath (1984) for the reinitialization procedure). 


13.5.3 Stabilized FTRLS Algorithm 


Further developments in the FTRLS algorithm has shown that the use of a special error 
feedback mechanism can greatly stabilize the FTRLS algorithm. It has been noted that by 
introducing computational redundancy by computing certain quantities in different ways, 
one can make specific measurements of the numerical errors present. These measure- 
ments can then be fed back to modify the dynamics of error propagation such that the 
unstable modes of the FTRLS algorithm are stabilized (Slock and Kailath, 1988, 1991). 
The quantities that have been identified to be appropriate for this purpose are the back- 
ward prediction error by ,_,(”), the conversion factor y,,,(7), and the last element of 
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Table 13.4 The FTRLS algorithm. 


Input: Tap-input vector x, ,,( — 1), Desired output d(n), 

Tap-weight vectors ay(n — 1), y(n — 1) and Wy (n — 1), 

Normalized gain vector ky (n — 1), and Least-squares sums cf (n — 1) and igty (n — 1). 
Output: The updated values of 


Ay (n), y(n), Wy(n), ky(n), CH (n) and cn). 


Prediction: 
Ín n10) = ay (n — l)xy 0) 


ÍN n0) = yyl — L) fy n- 


cE = Ach — 1) + fyn) fynai) 


Aa- 
Yyy) =A F Ynn- 1) 
chin) 
= 0 -1 fnn (n) x 
k =I fe 9: 1 JN.n=1 1 
n+ (n) lio = 5l day VASI) 


avo= ayn D- |g g p] fra 

by n10) =A (n = Dkygi yi) 

BQ) = 1- by n1 Yyy Eyy vu (rescue variable) 
ynn) = Bo (n)yyy 0) 

by nM) = yy (n)by n1 (0) 

Eb (n) = ACP? (n — 1) + by „by n10) 


ka T ky41 0) m Kya yar yn — 1) 


By(n) = By(n — 1) - ka by u(t) 


Filtering: 
ey n-10) = d(n) — Wy (n — 1I)Xy (n) 
ey nn) = Vy Men n10) 


Wy(n) = Wy(n—1)+ Ky (ney „(n) 


the normalized gain vector k n+1(7), that is, the three quantities used in computation of 
B(n) in Eq. (13.110). Slock and Kailath (1991) have proposed an elegant procedure for 
exploiting these redundancies in the FTRLS algorithm and have come up with a stabi- 
lized version of the FTRLS algorithm. However, as was noted before, even the stabilized 
FTRLS algorithm has to be treated with some special care, which makes it rather restric- 
tive in applications. In particular, it has been found that the stability of the SFTRLS can 
only be guaranteed when the forgetting factor, 4, is chosen very close to 1. As a rule of 


460 


Adaptive Filters 


thumb, it is suggested that à should be kept within the range 


1 
1-— <i<l (13.111) 
2N 


where N is the length of the filter. 


Problems 


P13.1 


P13.2 


P13.3 


P13.4 
P13.5 
P13.6 


P13.7 


Starting with Eq. (13.4) and using the principle of orthogonality derive Eq. 
(13.8). In addition, by inserting the relevant variables in Eq. (12.16), suggest an 
alternative derivation of Eq. (13.8). 


Following similar line of derivations as those in Problem P13.1, suggest two 
methods for derivation of Eq. (13.17). 


Work out the detail of derivations of the augmented normal Eqs. (13.35) and 
(13.39). 


Give a detailed derivation of Eqs. (13.23) and (13.24). 
Using the principle of orthogonality, prove Eq. (13.25). 


Consider the a posteriori forward and backward prediction errors f; ,(k) and 
b;_,(k), respectively, of a real-valued and prewindowed signal sequence x(k). 
Prove the following results: 


(i) n 
DOA fon K) fan(k — 1) = 0 
k=1 
(ii) 
yO Gor = > a0 
k=1 k=1 
(iii) 
yo bp px lk — m) = >> a" *B?, na K 
k=1 k=1 


(iv) ForO <1 <m, 


n 


be es eee ~~ D Sach (k) =0 


k=1 


Consider the a priori forward and backward prediction errors f; ,_,(k) and 
bj, —1(k) and also the associated a posteriori errors f; „(k) and b; „(k), respec- 
tively, of a real-valued and prewindowed signal sequence x(k). Prove the fol- 
lowing results: 
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P13.8 


P13.9 


P13.10 


P13.11 


P13.12 


P13.13 


(i) a 
DOA bnin- (k az D Siin K =0 
k=1 
(ii) 
n 
5 AE Snin bmn (k) =0 
k=1 
Consider a linear adaptive filter with tap-input vectors x(1), x(2), ..., x(n), and 


desired output sequence 


1 k= 
d=} É 
0, k=1,2,...,n—1 


Find the least-squares error sum of this filter and show that it is equal to the 
conversion factor y(n) as given by Eq. (13.49) or (13.50). Then, prove that 


O<y(n)<1l 
Show that in a lattice structure, at any instant of time, n, 


Ym+1 (n) = Yin (n) 


Prove that 


Sf bb 
Sm (n) Sn’ (n) Peah 
= = 1 Km (n)Km (n) 
Fin) n-i 


Use the result of Problem P13.10 to derive the following update equations for 


the least-squares sums off (n) and eng (n): 


(cn)? 
Fn) = cf ee, 
Em (n) — ny 2) ch (n _ 1) 


and 
EP (n)y’ 
f a) 


Obtain the normal equation that results from the least-square optimization of the 
a posteriori estimation error ey „(k) of the joint process estimator of Figure 13.3 
and show that this leads to the following set of independent equations: 


Lra A” bm n KAR) 


cH n) =i (n1) 


m—1 


"i ~ D= ADR (k) 
for m = 0, 1,..., N — 1. Then, use the orthogonality of the backward errors 
bm n(k), for m = 0, 1,..., N, to convert these equations to those given in Eq. 


(13.26). 
Give a detailed proof of Eq. (13.81). 
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P13.14 


P13.15 


P13.16 


P13.17 
P13.18 
P13.19 


P13.20 
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b 


Derive the update equations of x, ; 


Table 13.2. 


(n) and c,,(n) that have appeared in 


Prove the following identity 


T 1 
Wi = [9 ya p+ gaoa) 


m Ya n=D] fin) 
Show that 
y(n) = y (n— p rr _ faa 
j 7 cme) An) 
Give a detailed derivations of Eq. (13.96). 


Give a detailed derivations of Eq. (13.100). 


Use the result of Problems P13.17 and P13.18 to obtain a time-update equation 
relating y„(n) and y„(n — 1). 


Explore the possibility of rearranging the recursions/equations in Table 13.1 in 
terms of a priori estimation errors. Thus, suggest an alternative implementation 
RLSL algorithm using the a priori estimation errors. 


14 


Tracking 


Our study of adaptive filters so far has been based on the assumption that the filter input 
and its desired output are jointly stationary processes. Under this condition, the correlation 
matrix, R, of the filter input and the cross-correlation vector, p, between the filter input and 
its desired output are fixed quantities. Consequently, the performance surface of the filter 
is also fixed, with its minimum point given by the Wiener—Hopf solution w, = R7!p. 
Comparison between different algorithms, thus, would be based on their convergence 
behavior. In this context, superior algorithms are those with shorter convergence time. 

In this chapter, we study another important aspect of adaptive filters. In many applica- 
tions, the underlying processes are nonstationary. As a result, the Wiener—Hopf solution, 
w, = R™'p, varies with time, since R and p are time varying. In such a situation, adaptive 
algorithm is expected to not only adapt the filter tap weights to a neighborhood of their 
optimum values, but also follow the variations of the optimum tap weights. The latter, 
which is the subject of this chapter, is known as tracking. 

Before we start our study on tracking, we would like to remark that there is a clear 
distinction between convergence and tracking. Convergence is a transient phenomenon. 
It refers to the behavior of a system (here, an adaptive filter) when it starts from an 
arbitrary initial condition and undergoes a transient period before it reaches its steady 
state. Tracking, on the other hand, is a steady-state phenomenon. It refers to the behavior 
of a system in following variations of its surrounding environment, after it has reached its 
steady state. An algorithm with good convergence properties does not necessarily possess 
a fast tracking capability and vice versa. Part of our effort in this chapter is to clarify this 
seemingly unusual behavior of adaptive algorithms. 


14.1 Formulation of the Tracking Problem 


Much of the works related to the tracking behavior of adaptive filters is done in the 
context of the modeling problem depicted in Figure 14.1. The plant is a linear multiple 
regressor characterized by the equation 


d(n) = w!(n)x(n) + e,(n) (14.1) 
where x(n) = [xp(n) x(n) <- xy_,(n)]' is the tap-input vector, w,(n) = [ws 07) 
Wo) +> Wo y_1)I" is the plant tap-weight vector, e,(n) is the plant noise, and 
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Xo(n) 


o(n) 


E 


b. d(n) 


©N—1(n) 


Wo,n—1(N) 


Figure 14.1 Linear multiple regressor. 


d(n) is the plant output. The presence of the time index n in w,(n) is to emphasize 
that the plant tap-weight vector is time variant. This is unlike the notation w, that was 
used in the previous chapters to represent fixed plant weights. The role of the adaptive 
algorithm is to follow the variations in w,(7). 

The time-varying tap-weight vector w,(n) is chosen to be a multivariate random-walk 
process characterized by the difference equation 


wn + 1) = w,(n) + €,(n) (14.2) 


where e, (n) is the process noise vector. 
The following assumptions are made throughout this chapter: 


ay 


. The sequences e,(n), €, (n), and x(n) are zero-mean and stationary random processes. 

. The sequences e(n), €, (n), and x(n) are statistically independent of one another. 

3. The successive increments, €,(7), of the plant tap weights, are independent. However, 
the elements of €,(”), for a given n, may be statistically dependent. 

4. At time n, the tap-weight vector w(n) of the adaptive filter is statistically independent 

of e,(n) and x(n). 


N 


The validity of the last assumption (which is known as independence assumption) is 
justified only for small values of the step-size parameter(s) of the adaptation algorithm 
(Chapter 6, Section 6.2). This is assumed to be true throughout our discussions in 
this chapter. 


14.2 Generalized Formulation of LMS Algorithm 


In this section, we present a generalized formulation of the LMS (least-mean-square) 
Algorithm, which can be used for a unified study of the tracking behavior of various 
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adaptive algorithms. The LMS recursion that we consider is 
w(n + 1) = w(n) + 2pe(n)x(n) (14.3) 
where e(n) = d(n) — y(n) is the output error, y(n) = w! (n)x(n) is the filter output, w(7) 


and x(n) are the tap-weight and tap-input vectors, respectively, and mw is a diagonal 
matrix consisting of the step-size parameters corresponding to various taps of the filter. 


These parameters, which are called w;, i =0,1,...,N—1, are assumed fixed in our 
analysis. Furthermore, to keep Eq. (14.3) in its most general form, we follow the modeling 
problem of Figure 14.1 and choose x(n) = [xọ(n) x(n) + el. This allows for 


the possibility that the tap inputs may not correspond to those from a tapped-delay-line. 
The algorithms that are covered by Eq. (14.3) are: 


Conventional LMS Algorithm: By choosing w = ul, where u is a scalar step-size 
parameter and I is the N-by-N identity matrix, Eq. (14.3) reduces to the conventional 
LMS recursion. 

TDLMS algorithm: The recursion (14.3) will be that of the TDLMS (transform domain 
least-mean-square) algorithm, if x(n) is replaced by x;(n) = Tx(n), where T is a 
transformation matrix and x(n) is the filter tap-input vector before transformation. 
Moreover, we choose the normalized step-size parameters as (Chapter 7) 


1 


u 


=; for i=0,1,...,N-—1 (14.4) 
E[x2 ,(n)] 


Hi 


where u’ is a common scalar, xy ;(n) is the ith element of x;(n), and E[-] denotes 
statistical expectation. 

In actual implementation of the TDLMS algorithm, the values of E [xz ,(n)] are 
estimated through time averaging. However, to simplify our discussion, we assume 
that such averages are known a priori; thus, u;’s are fixed in our study here. 

Ideal LMS-Newton Algorithm: From Chapter 7, we recall that the ideal LMS-Newton 
algorithm is equivalent to the TDLMS with T replaced by the Karhunen Loéve trans- 
form (KLT) of the input process. Thus, the analysis that we do for the TDLMS algorithm 
can be immediately applied to evaluate the tracking behavior of the ideal LMS-Newton 
algorithm. 

RLS Algorithm: In Chapter 12 we found that when the RLS algorithm has undergone a 
large number of iterations so that it has reached its steady state, it can be approximated 
by the LMS-Newton recursion (12.90). This implies that the tracking behavior of the 
RLS and LMS-Newton algorithms are about the same, as tracking refers to the steady- 
state phase of the algorithms. 


14.3 MSE Analysis of the Generalized LMS Algorithm 


In this section, we consider the performance of the generalized LMS recursion Eq. (14.3) 
and derive an expression for its steady-state mean-squared error (MSE). Our discussion 
is in the context of the modeling problem introduced in Section 14.1. Our derivations 
here are similar to those in Chapter 6, where the convergence behavior of the LMS 
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algorithm was analyzed (Section 6.3). However, to overcome the analytical difficulties 
arising from the use of different step-size parameters at various taps, we make some 
further approximations. 

We note that 


e(n) = d(n) — w' (n)x(n) 
= d(n) — x" (n)w(n) 
= d(n) —x"(n)w,(n) — x" (miwn) — w,(n)] 
= e,(n) — x" (n)v(n) (14.5) 


where v(n) = w(n) — w,(n) is the weight-error vector, and from Eq. (14.1), e (n) = 
d(n) — x! (n)w,(n). Substituting Eq. (14.5) in Eq. (14.3) and using Eq. (14.2), we obtain 


vn+1)=(1- 2mx(n)x' (n))v(n) + 2me,(n)x(n) — e(n) (14.6) 


where I is the identity matrix. Next, we multiply both sides of Eq. (14.6) from right by 
their respective transposes, take statistical expectation of the results, and expand to obtain 


K(n + 1) = K(n) — 2wE[x(n)x! (n)v(n)v! (n)] 
—2E[v(n)v' (n)x(n)x' (n)]u 
+4pE[x(n)x! (n)v(n)v! (n)x(n)x! (n) |e 
+2E[( — 2ux(n)x'(n))v(n)x" (ne (n)a 
+2mEle,(n)x(n)v' (n)(I — 2x" (n)x(n))] 
+4wE[le,(n)|?x(n)x" (n) |e 
—E[(I — 2ux(n)x" (n))v(n)e3 (n)] 
—Ele,(n)v" (n) — 2x" (n)x(n))] 
—2pEle,(n)x(n)es (n)] 
—2E[e, (nes (n)x' (n) mu 
+Ele (nes (n)] (14.7) 


where K(n) = E[v(n)v'(n)]. According to assumptions 1—4 of Section 14.1, e,(m) is 
zero-mean and independent of x(n), v(m) and €,(n). The independence of e,(n) and 
v(n) = w(n) — w,(n) follows from the fact that e, (n) is independent of w(n) (Assumption 
4) and €,(n) (Assumption 2). Consequently, the Sth, 6th, 10th, and 11th terms on the 
right-hand side of Eq. (14.7) become zero. Similarly, the eighth and ninth terms on the 
right-hand side of Eq. (14.7) are also zero since €,(m) is zero-mean and independent 
of x(n) and v(n). The independence of € ,(n) and v(n) follows from the fact that v(7) 
is affected only by the past values of €,() and according to Assumption 3, €,(n) is 
independent of its past observations. Furthermore, the independence of x(n) and v(n) 
implies that 


E[x(n)x' (n)v(n)v' (n)] = Elx(n)x' (n) JElv(n)v' (n)] 
= RK(n) (14.8) 
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and 
Elv(n)v" (n)x(n)x" (n)] = E[v(n)v' (n)]E[x(n)x" (n)] 
= K(n)R (14.9) 
where R = E[x(n)x!(n)]. Assumption 2 implies that 
Efle.(n)|’x(n)x" (n)] = og R (14.10) 


where oe, =E [le,(n)|?] is the variance of the zero-mean random variable e, (n). Finally, 
considering the independence of v(m) and x(n) and assuming that the elements of x(n) 
are Gaussian-distributed and following a similar line of derivations as that which led to 
Eq. (6.39) (Appendix 6A), we obtain 


E[x(n)x! (n)v(n)v! (n)x(n)x! (n)] = Rtr[RK(n)] + 2RK(n)R (14.11) 
Using these results in Eq. (14.7), we obtain 
K(n + 1) = K(n) — 2pRK(n) — 2K(n)Ru + 4uRutr[RK(n)] 
+8uRK(n)Ru + 402 uRu +G (14.12) 


where 
G = Efe, (n)el(n)] (14.13) 


is the correlation matrix of the plant tap-weight increments. 
Next, we recall that 
Gex(n) = ELV" (n)x(n))”] (14.14) 


where &.,() is the excess MSE at time n. Using the independence of v(m) and x(n) and 
following the same line of derivations that led to Eq. (6.26), we obtain 


E[(v' (n)x(n))*] = tr[RK(n)] (14.15) 
Substituting this result in Eq. (14.14), we obtain 
Ex (n) = tr[RK(n)] (14.16) 


Since all the underlying processes are assumed to be stationary (Assumption 1 in 
Section 14.1), K(n) and &,(m) will be independent of n in the steady state. Hence, the 
time index n is dropped from K(n) and &,,() henceforth. 

Premultiplying Eq. (14.12) on both sides by she, taking the trace, and assuming that 
the algorithm has reached its steady state so that K(n + 1) = K(n) = K, we obtain 


tr[RK] + tf  KRa] = 2tr[Ru]tr[RK] + 4tr[RKR u] 
1 
+20% tr[Ru] + ste 'G] (14.17) 


Next, using the identity tr[AB] = tr[BA], which is true for any pair of M-by-N and 
N-by-M matrices A and B, we get tr[Ryu] = tr[wR], tr[RKRyw] = tr[uRKR] and 


tra KR] = tr[Ruu K] = tr[RK] 


468 Adaptive Filters 


Using these and Eq. (14.16) in Eq. (14.17), we obtain 
1 
2&,. = 2tr[wRIé,, + 4tr[wRKR] + 202 tr[uR] + stl 'G] (14.18) 


To arrive at a mathematically tractable result, we assume that the term tr[#RKR] in 
Eq. (14.18) can be ignored. Numerical examples and computer simulations show that 
when N (the filter length) is large tr[#RKR] is usually at least an order of magnitude 
smaller than tr[wRJé.,. See Problem P14.1 for more exposure over this approximation. 
This leads to the following result: 


1 
2 =I 
= —__ tr[wR] + -t G 14.19 
= (<2 [uR] + trl i) (14.19) 
Using Eq. (14.19) to evaluate the misadjustment of the generalized LMS algorithm, we 
obtain 
Sex 


1 
Sain 7 ies tr[ wR] 


where Eni, = on, is the minimum MSE of the filter that is obtained when w(n) = w,(n). 

To relate this result to the results of the previous chapters, let us consider the case of 
conventional LMS algorithm. In this case, y = I, where u is a scalar step-size parameter. 
Substituting u by uI in Eq. (14.20), we obtain 


Sex 


M = 


Gas + 7711-61) (14.20) 


Mims = 


(uR + G) (14.21) 


il 
1 — utr[R] 
It is instructive to note that when the plant tap-weight, w,, is time invariant, the correlation 
matrix G is zero, as €,(n) is zero for all values of n. Thus, we obtain 
utr[R] 
M =- 14.22 
LMS — 1 = utr[R] ( ) 
This is exactly the result that we obtained in Chapter 6 — see Eq. (6.63). This observation 
shows that Eq. (14.20) is in fact a generalization of similar results that were obtained 
in the previous chapters. This includes the effect of plant variation and also the use of 
different step-size parameters at various taps. Moreover, we see that when the plant is 
time varying, there are two distinct terms contributing to the misadjustment of the LMS 
algorithm. Accordingly, we may write 


M=M,+M, (14.23) 
where 
eo (14.24) 
1 — tr[wR] 
and 


oe tr[e!G] 


a= 1 — t[uR] 


(14.25) 


With reference to recursion (14.6) and the subsequent derivations, we find that M} orig- 
inates from the term 2me,(n)x(n) on the right-hand side of Eq. (14.6). This, clearly, is 
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contributed by the plant noise, e,(n). Similarly, one finds that M, is a direct contri- 
bution of the plant tap-weight increments €,(n). Accordingly, M, is called the noise 
misadjustment, and M, is referred to as the lag misadjustment. We note that the noise 
misadjustment decreases as the step-size parameters, j1;’s, decrease. On the other hand, 
a smaller lag misadjutment is achieved by increasing the step-size parameters. Thus, it 
becomes necessary to find a compromise choice of the step-size parameters which will 
result in the right balance between the noise and lag misadjustments. This is the subject 
of the next section. 


14.4 Optimum Step-Size Parameters 


To derive a set of equations for the optimum step-size parameters that minimize 
the excess MSE and thus the misadjustment of the LMS algorithm, we first expand 
Eq. (14.19) to obtain 


l a 
y ea Oe, 14.26 
gx = i = 11,0 T= yee = (u0? g: a ( ) 


Au; 


where o2 and o? 3; are, respectively, the variances of x; (n) and the ith element of €,(n), 
that is, the diagonal elements of the respective correlation matrices, R and G. 

The optimum values of the step-size parameters are obtained by setting the derivatives 
of &., with respect to u;’s equal to zero. Solving the set of simultaneous equations 


ð 
Bex _ o, for i=0,1,...,N—1 (14.27) 
OL; 
we obtain (Problem P14.4) 
Oe, 
loi = —— = i=0,1,...,N—-1 (14.28) 


/ 2 
Oyi one + Oe, 


where the subscript “o” is added to jz, ;’s to emphasize that they are the optimum values of 
the step-size parameters. Moreover, Esx o refers to the excess MSE when the optimum step- 
size parameters, Mo; S, are used. This solution, of course, is not complete, as Esx depends 


“o ” 


on Mo; S. To complete the solution, we define n = ,/&.x + oÈ and replace Eq. (14.28) 
in Eq. (14.26). This results in a second-order equation in 7 whose solutions are 


Liten (Cite A) +40 
2 
Noting that 7 cannot be negative, we find that 


Liten + (Cites) +402 


n = 


n= 5 (14.29) 
is the only acceptable solution of 7. With this, we obtain 
Moi = >, for i=0,1,...,N—1 (14.30) 


2no,, 
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It is instructive to note that Eq. (14.30) is intuitively sound. It suggests that those 
taps that have larger tap perturbation should be given larger step-size parameters. It also 
suggests normalization of the step-size parameters proportional to the inverse of the signal 
level at various taps. However, this normalization is different from the one commonly 
used in the step-normalized algorithms, where jz; is selected proportional to the inverse 
of signal power at the respective tap, that is, proportional to 1 cre Moreover, Eq. (14.30) 
suggests that the step-size parameters should be reduced as the error level at the filter 
output increases — note that n? is equal to the MSE of the filter after it has converged. 

The validity of Eq. (14.30) is subject to the condition that the optimum step-size 
parameters remain in a range that does not result in instability of the algorithm. For 
the case of the conventional LMS algorithm, where a single step-size parameter, ju, is 
employed, a useful and practically applicable upper bound for u is the one derived in 
Chapter 6 and repeated below for convenience (6.74): 


u (14.31) 


< — 
3tr[R] 
or, equivalently, 


E ; (14.32) 


This result can be extended to the generalized LMS recursion Eq. (14.3) as follows. 

Consider the recursion Eq. (14.3) and define x(n) = u!’ x(n), where u!’ is the diago- 
nal matrix consisting of the square roots of the diagonal elements of u. Then, multiplying 
both sides of Eq. (14.6) from left by w~!/? and, also, defining ¥(n) = w—!/2v(n) and 
é,(n) = w'/*e,(n), we obtain 


vn+1=d- 2%(n)x! (n))¥(n) + 2e,(n)k(n) — é (n) (14.33) 


The recursion Eq. (14.33) is similar to the conventional LMS recursion with u = 1. 
Accordingly, Eq. (14.32) can be applied. Hence, we find that the stability of Eq. (14.33), 
and thus Eq. (14.6) or, equivalently, the recursion Eq. (14.3), is guaranteed if 


tr[R] < 1/3 (14.34) 
where 
R = E[X(n)x"(n)] 
= Elm'?x(n)x" nu] 
= pi E[x(n)x" mu 
= p/P? Rp! (14.35) 
Substituting this result in Eq. (14.34) and noting that tr[u! Ru !/?] = tr[ wR] (according 
to identity tr[ AB] = tr[BA]), we get 
tr[wR] < ; (14.36) 


This is a sufficient condition that may be imposed on the algorithm step-size parameters, 
L; S, to guarantee the stability of the generalized LMS recursion Eq. (14.3). 


Tracking 471 


When Eq. (14.36) holds, the minimum excess MSE of the filter, Esx,» is obtained by 
substituting Eq. (14.30) in Eq. (14.26). This gives 


1 é 
exo = =u —(% e 4 ")) 2%, Bae. (14.37) 


Substituting for 7 from Eq. (14.29) in Eq. (14.37), we get 


Des Oe, HE Oeir T + 402 


Exo = , a8 (14.38) 


14.5 Comparisons of Conventional Algorithms 


In this section, we compare tracking behavior of various versions of the LMS algorithm 
in the context of the modeling problem discussed in the last few sections. Noting that 
the tracking behaviors of the RLS and LMS-Newton algorithms are about the same, 
the comparisons also cover the RLS algorithm. The indicator of better tracking behavior 
(performance) is lower steady-state excess MSE. 

In order to prevent diverting into many possible cases, we concentrate on the com- 
parison of the direct implementation of a transversal filter, using the LMS algorithm, 
and its implementation in transform domain. We note that for a transversal filter x(n) = 
[x(n) x(n — 1) --- x(n — N +1)]!, and for its transform-domain implementation x(n) is 
replaced by x7 (n) = Tx(n), where T is an orthonormal transformation matrix satisfying 
the condition! 

Ss ge | (14.39) 


where I is the identity matrix. 
In addition, if €,() represents the plant tap-weight increments in its transversal form, 
the corresponding increments in the transform domain are given by 


Eron) = Te, (n) (14.40) 


We also define Ry = E[x7(n)x7(n)] and Gy = Elez,,(n)e7 ,(n)] and note that 


R; =TRT' (14.41) 
and 

Gr =TGT' (14.42) 
The ith diagonal elements of Ry and Gz are denoted as a = and og, _, respectively. 


Moreover, to simplify our discussion, yet with no loss of sieer. Wwe assume that 
the input sequence to the transversal filter is normalized to unit power, that is, o? = 
E{|x(n — i)|7} = 1, fori =0,1,..., N — 1. Then, the orthonormality of T, that is, the 


' To avoid complex-valued coefficients/variables in our formulations in this chapter, we only consider transforma- 
tions with real-valued coefficients. 
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condition (14.39), implies that 
o =) o2 =N (14.43) 


We note that in the case of the conventional LMS algorithm, a single step-size param- 
eter, u, is used for all taps. On the other hand, in the case of TDLMS algorithm, 
different step-size parameters are used for various taps and they are selected according to 
Eq. (14.4). Furthermore, for a fixed misadjustment, say M, we have ((6.64) and (7.31)) 


M M 
tr[R] ~ Die A 


u= (14.44) 
and 


=a 14.45 
u N ( ) 


Thus, in the light of Eq. (14.43), we find that w’ = n in this case. 
Using the above results in Eq. (14.26), the excess MSE of the conventional LMS and 
TDLMS algorithms are obtained as 


N-1 
1 1 
Ex (LMS) = (uxo +>). “] (14.46) 
1 — uN Au 


and 


£x (TDLMS) = — Nopti -Doh (14.47) 
ex > 1—pN u o, =; o,i os ` 


respectively. In arriving at Eqs. (14.46) and (14.47), we made use of the assumption 
og = l, for i =0,1,..., N — 1, along with Eqs. (14.43) and (14.4). 
Now, let us consider a few specific cases: 


Case 1: G = oI and R is arbitrary. 
Substituting G = oI in Eq. (14.42) and recalling Eq. (14.39), we obtain G7 = 
og I, which implies that 


oe =o2 =o02, for i=0,1,...,.N—1 (14.48) 


ET „oi €0,i Eg 


Substituting Eqs. (14.48) and (14.43) in Eq. (14.47), we obtain 


N-1 
1 1 
Ex (TDLMS) = T-uN (uxoż + gu” 5 “,, 
i=0 


ON (uae +3 1%) 
1— uN 


(14.49) 


From this result, we see that when G = oZ I, €,,(TDLMS) is independent of 
T. Furthermore, with G=o02 I, Eq. (14.46) also simplifies to Eq. (14.49). 
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This, in turn, means that independent of the transformation used, the tracking 
performance of the TDLMS algorithm remains similar to that of the conven- 
tional LMS algorithm. Furthermore, noting that the LMS-Newton algorithm is 
equivalent to the TDLMS algorithm when KLT is used as its transformation, 
this conclusion also applies to the comparison of the conventional LMS and 
LMS-Newton algorithms. Moreover, noting that the RLS and LMS-Newton 
algorithms have similar tracking behavior (Section 14.2), we may also add that 
in this case the conventional LMS and RLS algorithms have similar tracking 
behavior. 
Case 2: R = I and G is arbitrary. 
Using R = I in Eq. (14.41), we find that Ry is also equal to the identity matrix. 
Thus, 
o,,=1, for i=0,1,...,.N—1 


XT, 


Using this in Eq. (14.47), we obtain 


1 1 N-1 
TDLMS) = No + — E 14.50 
exl ak (. St Te >, (14.50) 


Now, 


N-1 


a = tr[G7] = [Z GZT] = t[7"TG] 
i=0 


N-1 
=tlG] =) 02, (14.51) 
i=0 


where we have used Eqs. (14.39) and (14.42), and the identity tr[ AB] = tr[BA]. 
Substituting Eq. (14.51) in Eq. (14.50), we obtain 


N-1 
1 1 
E (uxo 4 T > “2 ] (14.52) 


i=0 


£.,(TDLMS) = 


Comparing this with Eq. (14.46), we find that in this case also, irrespective of the 
transformation T, there is no difference between the tracking behaviors of the 
conventional LMS and TDLMS algorithms. Thus, all the conclusions drawn for 
Case 1 continue to hold for Case 2 also, that is, the conventional LMS, TDLMS, 
LMS-Newton, and RLS algorithms all have similar tracking behavior. 
Case 3: R and G are arbitrary. 

From Eq. (14.47), we note that to study the variation of the excess MSE of the 
TDLMS algorithm for different choices of 7, we need to study the summation 


N-1 
2 2 
2: OT oi XTi (14.53) 
i=0 
Moreover, we note that the orthonormality of 7, that is, the identity 


TT" =I, implies that the summations }`;of,, and )°;o2, ,, are independent 
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of T. However, the individual terms under the summations, that is, Oir S 


and Diery Si vary with 7. Thus, while the summations ye and Byak ai 
are fixed for different choices of 7, the distributions of the terms i. 
Oa. ’s vary with J. These distributions also depend on the correlation matrices 
R and G. When R and G are arbitrary, these distribution are also arbitrary. As 
a result, we find that when no prior information about R and G is available, 
nothing can be said about the summation Eq. (14.53), and thus no specific 
comment can be made about the tracking behavior of various algorithms. The 


following numerical example clarifies this further. 


Let 
1.0 0.5 0.0010 0.0008 
las ra ae o Sion 


’s and 


In addition, define 


T= | cos 0 | 


—sin@ cosé 


This is an arbitrary 2-by-2 orthonormal transformation matrix that varies with 0. 
Table 14.1 presents the summary of the results that we have obtained for the LMS 
and TDLMS algorithms for two choices of 0 = 2/8 and 7/4. It is noted that in 
the case of 6 = 2/8, the TDLMS shows a better tracking behavior than the LMS 
algorithm — compare the summations in the last line of Table 14.1. However, 
the LMS algorithm behaves better when 6 = 7/4 is chosen. Incidentally, here, 
0 = x/4 makes T correspond to the KLT of the filter input, for which the 
TDLMS algorithm is also equivalent to the LMS-Newton algorithm. 


The comparisons given above assume that we have no information about the correlation 
matrix G of the plant tap-weight increments. Thus, the optimum step-size parameters 
derived in the last section Eq. (14.30) could not be used. In Section 14.7, we show that 
the optimum step-size parameters can, in fact, be obtained adaptively using the variable 
step-size least mean square (VSLMS) algorithm introduced in Chapter 6 (Section 6.7). 
Noting this, we consider using the optimum step-size parameters given by Eq. (14.30) 
and present some more comparisons of the various algorithms in the next section. 


Table 14.1 Comparison of the conventional LMS and 
TDLMS for a numerical example. 


TDLMS 

LMS 6=7/8 0 = 7/4 

a? = 1.0000 o o 13536 1.5000 

a? = 1.0000 og, = 0.6464 0.5000 

a2 = 0.0010 Oe, 49 = 0.0029 0.0063 

aż = 0.0100 oĉ, = 0.0081 0.0047 
10,2 = 0.0110 i%er Fer, = 0.0091 0.0118 
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14.6 Comparisons Based on Optimum Step-Size Parameters 


From the theoretical results of Section 14.4 and the definitions of Ry and Gz, in the last 
section, we recall that when the optimum step-size parameters given by Eq. (14.30) are 
used, the excess MSE of the TDLMS algorithm is given by Eq. (14.38) 


Ty +T} + 402 
§ex,0( TDLMS) = r (14.54) 


5 TI 


where 


(14.55) 


We note that F7 is a function of R, G, and T. 
We also note that when no transformation has been applied, but the optimum step-size 
parameters are used for different taps, the excess MSE of the LMS algorithm is given by 


Ty +, /T? +402 
(14.56) 


T 
2 I 


Exo (LMS) = 


where 


N-1 
i=) see (14.57) 
i=0 


Clearly, to achieve the best tracking performance of the TDLMS algorithm, one should 
find the matrix 7, which minimizes T7. A general solution to this problem appears to 
be difficult. We thus limit ourselves to a few particular cases whose study is found to be 
instructive. The following lemma is widely used in the study of the cases that follows. 


Lemma 14.1 Consider the diagonal matrix A = diag (Ao, A,,...,Ay_1), where i,’s 
are all real and nonnegative. If T is an orthonormal matrix, that is, TT T — I, and 
S=TAT", then the following inequality always holds: 


N-1 N-1 
Y Vus} sn (14.58) 
i=0 i=0 


where s; is the ith diagonal element of S. 


Proof . We first note that for x > 0, f(x) = ./x is a concave function. In addition, accord- 
ing to the theory of the convex functions, Rockafellar (1970), if f (x) is a concave function 


and ¢o, Çi, -.-, ¢y_, are a set of nonnegative numbers that satisfy aur ¢; = 1, then 
for any set of numbers xX, x), ..., Xy_, in the domain of f(x), the following inequality 
holds 


N-1 N-1I 
5 Gas f on] (14.59) 
i=0 i=0 


476 Adaptive Filters 


Next, we note that 


N-1 
Si = 5 MT (14.60) 
1=0 
where Ty is the i/th element of 7. In addition, the orthonormality of Z implies that 
N-1 
ual (14.61) 
i=0 


Choosing ¢; = ae x; =A;, and f(x) = ./x in Eq. (14.59), and using Eq. (14.60), we 
obtain 


N-1 
Yo lta P ar < V5 (14.62) 
1=0 


Summing up both sides of Eq. (14.62) over i = 0, 1, ..., N — 1 and using Eq. (14.61) 
completes the proof. 


We are now ready to consider a few specific cases: 


Case 1: R = I and G is an arbitrary diagonal matrix. 
The assumption R = I and the orthonormality of 7 implies that Ry = I. Thus, 


o? =o}, =1, for i=0,1,...,N—1 (14.63) 
Using Eq. (14.63) in Eqs. (14.57) and (14.55), we get 
N-1 
f= >), (14.64) 
i=0 
and 
N-1 
DDAN (14.65) 
i=0 


respectively. On the other hand, noting that G is a diagonal matrix consisting of 

the elements of , o ,..., o2 ,, the diagonal elements of G; = TGT” are 
0,0 0,1 o,N—1 

o2 o2 1. 2 and using the above Lemma, we find that 


€T,0,0? ~ €T,0,1’ €T,0,N-1’ 
ly <r (14.66) 
Using Eq. (14.66) in Eqs. (14.54) and (14.56), we find that in this case 
Exo (LMS) < Eex o (TDLMS) (14.67) 


That is, when R = I and G is diagonal, and the optimum step-size parameters, 
Hoi S$, are used; there is no transformation that can improve the tracking behavior 
of the LMS algorithm. 

Case 2: R = I and G is arbitrary. 
Let T = T; be the orthonormal transform which results in a diagonal matrix 
Gr, = T,GT,'. Using T = T, leads to a TDLMS algorithm in which Rz; = 
TRTI = I (since R = I), and Gz, is diagonal. This is similar to Case 1, above. 
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Hence, for the same reason as in Case 1, we can argue that the choice of T = T; 
results in a TDLMS algorithm with optimum tracking behavior. 

Case 3: G = oI and R is arbitrary. 
Following the same line of reasoning as in Case 2, we find that here the opti- 
mum transform, J, which result in a TDLMS algorithm with the best tracking 
behavior is the one that result in a diagonal Ry = RZJ. That is, here, the 
optimum transform, for achieving best tracking, is the KLT. 


14.7  VSLMS: An Algorithm with Optimum Tracking Behavior 


The variable step-size least mean square (VSLMS) algorithm was introduced in Chapter 6, 
on the basis of an intuitive understanding of the behavior of the LMS algorithm. In this 
section, we present a formal derivation? of the VSLMS algorithm as an adaptive filtering 
scheme with optimal tracking behavior. 


14.7.1 Derivation of VSLMS Algorithm 


From the results presented in Section 14.3, we observe that in a time-varying environment, 
the steady-state MSE of an adaptive filter vary with the step-size parameters. Moreover, a 
study of the excess MSE, &,,, shows that it is a convex function of the step-size parameters, 
Li’ s, when these vary over a range that does not result in instability (14.26). This implies 
that the MSE 


& = Ele*(n)] = Emin + Sex 


is also a convex function of the step-size parameters. With this concept in mind, one 
may suggest the following gradient search method for finding the optimum step-size 
parameters, Mo,’ S, of the LMS algorithm: 


ag 
Jmn- 


p(n) = p(n — 1) (14.68) 


where p is a small positive adaptation parameter. 
In analogy with the LMS algorithm, the stochastic version of the gradient recursion 
(14.68) is 
de?(n) 


u(n) = uj, (n — 1) Poke 1) (14.69) 


Recalling that 


e(n) = d(n) — w'(n)x(n) 
N-1 


= d(n) — ` wnx (n) (14.70) 


1=0 


? The derivation presented here first appeared in Mathews and Xie (1990). 
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we obtain 


2 
ðe (n) — 2e(n) de(n) 
du; (n — 1) dp; (n — 1) 


aa ðw; (n) 
= PORN a =i =i 


where we have noted that in the summation }°)w,(n)x,;(n) only w;(n) varies with 
fu;(n — 1). The tap weight w;(n) is related to u;(n — 1) according to the LMS recursion 


(14.71) 


w,(n) = w,(n — 1) + 2u;(n — Den — 1)x;(n — 1) (14.72) 


Substituting Eq. (14.72) in Eq. (14.71), and defining 


gj (n) = —2e(n)x;(n) (14.73) 
we get 
aoe (n)g;(n — 1) (14.74) 
———_ = -8,(n)g,(n — . 
du,(n — I) 8) 8; 
Finally, substituting Eq. (14.74) in Eq. (14.69), we obtain 
hin) = m;n = 1) + pgi(nygin — 1) (14.75) 


for i = 0, 1,..., N — 1. This is the recursion Eq. (6.138), which was introduced in 
Chapter 6. 

From Eq. (14.75), we note that the step-size parameters, j1;(”)’s, settle near their 
steady-state values when 


E[g,(n)g,—1)]=0, for i=0,1,...,N—1 (14.76) 


Accordingly, a rigorous proof of the optimality of the VSLMS algorithm may be given by 
solving Eq. (14.76) for a set of unknown step-size parameters and showing that its solution 
matches the optimum parameters as given in Eq. (14.30). In fact, some researchers have 
proved this to be true for some specific cases (Mathews and Xie, 1993; Farhang-Boroujeny 
and Gazor, 1994). Here, we ignore such derivations as they are lengthy and a general 
solution turns out to be difficult to derive. Instead, we rely on simulations to verify the 
optimality of the VSLMS algorithm (Section 14.7.4). 


14.7.2 Variations and Extensions 
Sign Update Equation 


Recall from Chapter 6 that the stochastic gradient terms, g;(n) and g;(n— 1), in 
Eq. (14.75) may be replaced by their respective signs, to obtain 


u(n) = u(n — 1) + psignlg;(n)] - sign[g;(7 — 1)] 
= h(n — 1) + psign[g;(1)g;(n — 1)] (14.77) 


This may be referred to as sign update equation, in analogy with the LMS sign algorithm 
(Section 6.5). 
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Multiplicative versus Linear Increments 


Other variations of the step-size update equations that have been proposed in the literature 
are (Farhang-Boroujeny and Gazor, 1994) 


u(n) = (1 + pgi (ngi (n — 1))uj(n — 1) (14.78) 
and its sign update version 
u(n) = (1 + psignlg; (n)g; (n — DIa; (n — 1) (14.79) 


For easy reference, we refer to Eqs. (14.75) and (14.77) as step-size update equations 
with linear increments and Eqs. (14.78) and (14.79) as step-size update equations with 
multiplicative increments. We make some comments on the performance of the linear and 
multiplicative increments later in Section 14.7.4. 

Clearly, for a small p, Eq. (14.78) reaches it steady state when Eq. (14.76) is satisfied. 
This shows that both Eqs. (14.75) and (14.78) converge to the same set of step-size 
parameters. Similarly, Eqs. (14.77) and (14.79) converge to the same set of step-size 
parameters. Furthermore, when g;(n)g;(n — 1) has a symmetrical distribution around its 
mean (a case likely to happen, at least approximately, in most of applications) and p is 
small, all of these step-size update equations converge to the same set of parameters. 


VSLMS Algorithm with a Common Step-Size Parameter 


In many applications, to keep the complexity of the filter low, we are often interested in 
using a common step-size parameter, u(n), for all the filter taps. Following a similar line 
of derivations as in Eqs. (14.68)-(14.75), the following recursion can be easily derived 
(Mathews and Xie, 1993): 


u(n) = u(n — 1) + pe(nje(n — Dx" (n)x(n — 1) (14.80) 

The sign version of this recursion may be given as 
u(n) = u(n — 1) + psign[e(n)e(n — DxT(n)x(n — 1)] (14.81) 
These are recursions with linear increments. Extension of these recursions to those with 


multiplicative increments is straightforward. 


VSLMS Algorithm for Complex-Valued Case 
For filters with complex-valued input, following a similar line of derivation as that led to 
Eq. (14.75) results in the following recursion (Problem P14.7): 

Wj) = wn — 1) + p (8i RMS, RM 1) + 8,1(2)8;,1% — 1) (14.82) 


Here, the subscripts R and I refer to the real and imaginary parts of g,(n) and g;(n — 1), 
and 
gi (n) = —2e*(n)x;(n) (14.83) 
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The sign version of Eq. (14.82) is 


u(n) = u(n — 1) + o(sign[g; rgi ar — DI + signlg, 1) gi; DD 4.84) 


Extensions of these results to update equations with multiplicative increments and also 
to the case where a common step-size parameter is used for all the filter taps are 
straightforward. 

The last point to be noted here is that the step-size parameters should always be lim- 
ited to a range that satisfies the stability requirement of the LMS algorithm. Equation 
(14.36) specifies the condition that must be satisfied by the step-size parameters to guar- 
antee stability. However, how this is implemented in actual practice to limit the step-size 
parameters can vary. In Section 14.7.4, we discuss a possible way of limiting the step-size 
parameters. 


14.7.3 Normalization of the Parameter p 


In the update equations (14.75) and (14.78), and similar equations for complex-valued 
case, the step-size increments are proportional to the size of g;(n)g;(n — 1). This might 
be inappropriate when signal levels vary significantly with time. This results in fast and 
slow variations of the step-size parameters, depending on the level of input signal, x; (n), 
and also the output error, e(n). To keep a more uniform control (variation) of the step-size 
parameters that do not depend on signal levels, we adopt a step-normalization technique 
similar to the one used in the LMS algorithm. Here, the adaptation parameter p is replaced 
by p;(n), which is obtained according to the following equation: 


Po 


cee 14. 
ôn) + ey 


p(n) = 


where p, is an un-normalized parameter common to all taps, 53 (n) is an estimate of 
E(\g; (n)|7] that may be obtained through the recursion 


62(n) = pê? (n — 1) + — Ble? (14.86) 


where $ is a forgetting factor close to but less than 1, and w is a positive constant that 
prevents possible instability of the algorithm when 63 (n) is small. 

In the case of Eq. (14.80), the normalization is done with respect to E [e?(n)x! (n)x(n)]. 
This can also be estimated through a time-averaging recursion similar to Eq. (14.86). 


14.7.4 Computer Simulations 


In this section, we present some simulation results to illustrate the optimal tracking behav- 
ior of the VSLMS algorithm. As an example, we consider the application of VSLMS 
algorithm in the identification of a multipath communication channel.* The channel is 


3 The simulation results presented here are simplified versions of those presented by the author in Farhang- 
Boroujeny and Gazor (1994). Here, we consider the case where all variables are real-valued. In Farhang-Boroujeny 
and Gazor (1994), all the variables are assumed to be complex-valued. However, the conclusions derived from the 
results of this section as well as those in Farhang-Boroujeny and Gazor (1994) are similar. 
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assumed to have two distinct paths with a continuous-time impulse response 
hy (to) = a1 (t) p(t — Ty) + y(t.) Pt — T(t)) (14.87) 


where t, is the time at which the channel response is given (measured), T,(f,) and T>(t,) 
are the path delays, a,(t,) and a,(t,) are the path gains, ¢ is the continuous time variable, 
p(t) is the raised-cosine pulse with 50% roll-off factor given by 


sin(wt/T,) cos(at/2T,) 


POS, 1- G/T 


(14.88) 


and T, is the symbol interval. As explicitly indicated in Eq. (14.87), the path delays and 
gains may vary with time, for example, they depend on time ż, at which the channel 
response is given. An adaptive filter is used to track these variations. 

Figure 14.2 depicts the simulation setup that is used here. The discrete-time channel 
model 


N-1 
We) = D> wo (nz (14.89) 
i=0 


is related to the continuous-time channel response h,(t,) as below: 
win) = hin nT), for i=0,1,...,.N—1 (14.90) 


where nT, is the time at which the discrete response of the channel is measured. 

For all simulations, we keep t,(nT,) fixed at the value of 27,, but let t,(nT,) vary at 
a constant rate from 47, to 147, over every simulation run that takes 100000 iterations 
(equivalent to 100 000 T, s). The discrete-time samples of path gains a,(nT,) and a,(nT,) 
are generated independently by passing two independent unit-variance white Gaussian 
processes through single pole low-pass filters with the system function 


aAa 
ae (14.91) 
l1—az 


eo(n) 


Figure 14.2 Modeling of a multipath communication channel. 
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where the parameter œ is related to the channel fade rate, fy, and the symbol-rate, f, = 
1/T,, according to the following equation: 


-Tla 
fs 


For typical values of fy that allow the VSLMS algorithm to follow variations of the 
channel, the variations of a,(nT,) and a,(nT,) very closely approximate a random walk 
model similar to the one used to develop the analytical results of this chapter (Eweda, 
1994); see also Problem P14.8. This approximate realization of random walk prevents 
indefinite increase in the path gains that would otherwise happen if we had used the 
random walk model of Section 14.1. 

The following parameters are used in all the simulations that follow. The channel length, 
N, is set equal to 16. The same length is also assumed for the channel model (adaptive 
filter). The tails of the raised-cosine pulses associated with the two paths that lie outside 
the range set by channel length are truncated. A fade rate of fy = f,/2400 is assumed. 
In the implementation of the sign update equations with multiplicative increments, we 
choose p = 0.002. For the conventional update Eqs. (14.75) and (14.78), the parameter 
p is normalized as discussed in Section 14.7.3. The following parameters are used: 


=i (14.92) 


e for Eq. (14.75), 8 = 0.95, y = 0.001, and p, = 0.0002 
e for Eq. (14.78), 8 = 0.95, y = 0.001, and p, = 0.002 


The data sequence, s(n), at the channel and adaptive filter input is a binary zero-mean 
white random process, taking values +1. The channel noise, e(n), is a zero-mean white 
Gaussian process with variance ož = 0.02. This choice of ož results in an average 
signal-to-noise ratio of 20 dB at the channel output. 

The step-size parameters are checked at the end of every iteration of the algorithm and 
limited to stay within a range that satisfies Eq. (14.36). When Eq. (14.36) is not satisfied, 
all the step-size parameters are scaled down by the same factor such that tr[wR] reduces 
to its upper bound 1/3. The step-size parameters are also hard limited to the minimum 
value of 0.001. 

Next, we present a number of results comparing the relative performance of various 
implementations of the step-size adaptation in the present application. These results also 
serve to show optimal tracking behavior of the VSLMS algorithm. 

Figure 14.3 presents a typical result comparing the performance of the update Eqs. 
(14.75) and (14.78), that is, the recursions with linear and multiplicative increments, 
respectively. These and the subsequent results of this section are based on single sim- 
ulation runs; that is, no ensemble averaging is used. However, time averaging with a 
moving rectangular window is used to smoothen the plots. We note that adaptation based 
on multiplicative increments results in lower steady-state MSE, that is, a superior track- 
ing behavior. This may be explained by noting that the variation of step-size parameters 
follows a geometrical progression with multiplicative increments and hence it can react 
much faster to changes than its linear counter part. Because of this observation, the rest of 
the simulation results are given only for step-size updates with multiplicative increments. 

Figure 14.4 shows a set of curves comparing the tracking behaviors of the VSLMS 
algorithm and the LMS algorithm with optimum step-size parameters. The optimum step- 
size parameters of the LMS algorithm are obtained according to Eq. (14.30), as explained 
later. Results of both the conventional step-size update Eq. (14.78) and its sign version, 
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Figure 14.3 A typical simulation result comparing the VSLMS algorithm with linear and multi- 
plicative increments. 
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Figure 14.4 A typical simulation result comparing the LMS algorithm with optimum step-size 
parameters and the VSLMS algorithm. 
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Eq. (14.79), are presented. We note that both implementations of the VSLMS algorithm 
converge to about the same excess MSE as the case where optimal step-size parameters 
are used. This clearly illustrates the optimal tracking behavior of the VSLMS algorithm, 
as was predicted earlier in this section. We also note that there is very little difference 
between the behavior of the recursion Eq. (14.78) and its sign counterpart, Eq. (14.79). 

Figure 14.5 presents the results showing how the VSLMS algorithm track variation 
of /4.19(7), that is, the optimum step-size parameter of the 10th tap of the adaptive 
filter. The results are given for the recursion Eq. (14.78) and also its sign counterpart, 
Eq. (14.79). The results show that the VSLMS algorithm converges to the optimum step- 
size parameters, thus achieving a close to optimum tracking behavior. Further experiments 
have confirmed this optimum performance even when the adaptive filter input is colored 
(Mathews and Xie, 1993; Farhang-Boroujeny and Gazor, 1994). 

Computation of the optimum step-size parameters, which are used to obtain the results 
of Figures 14.4 and 14.5, is carried out by finding o,, and o, , first, and then substituting 
them in Eq. (14.30). Noting that the filter input is a binary sequence, we get o = 1. To 
evaluate o,,,, we recall from Eq. (14.91) that the path gains, a; (nT,) and a,(nT,), are 
generated using the recursions 


a,((n + 1)T,) = aa,(nT,) + V1 —a?v,n+1), for k=1, 2 (14.93) 


where v,;(n + 1) and v,(n + 1) are two independent unit-variance zero-mean Gaussian 
white sequences. Assuming that œ is smaller but close to 1, we find that 


l-a<«<vl-a? 
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Figure 14.5 A typical simulation result showing that the VSLMS algorithm closely track the 
optimum step-size parameters given by Eq. (14.30). 
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Thus, for k = 1 and 2, we obtain 
a,((n + 1)T,) — a, (nT,) = —(1 — œja (nT) + V1 — av, (n + 1) 
~V1—a7v,(n + 1) (14.94) 


where the latter approximation is (statistically) justified by noting that a,(nT,) and 
v;,(n + 1) are two random variables with the same variance. On the other hand, from Eq. 
(14.2) we get €,(n) = w,(n + 1) — w,(n), or 


Eo i(n) = Woi (n + 1) — w; (n) (14.95) 


Next, substituting Eq. (14.90) in Eq. (14.95), and assuming that the path delays q; (nT,) 
and t,(n7,) vary very slowly in time so that their variations over the span of the channel 
length (NT, seconds) can be ignored, we obtain using Eq. (14.94) 


ein) =V1— o?[v,(n + 1) pT, — q (nT) 
4+u,(n + DPT,- H(nT)I (14.96) 


fori =0, 1, ..., N — 1. Using Eq. (14.96), and recalling that v; (n + 1) and v,(n + 1) 
are unit-variance independent processes, we obtain 


og (n) = (1 —a”)(p°(iT, — 1 (nT) + pP GT, — t)(nT,))) (14.97) 


or 


oe n) = ja — aP (pP (iT, — t,(nT,)) + pT, — %(nT,))) (14.98) 


14.8 RLS Algorithm with Variable Forgetting Factor 


In this section, we extend the idea of VSLMS algorithm to propose an RLS algorithm with 
variable forgetting factor. To this end, we recall from the last chapter that in the steady 
state, the RLS algorithm is approximately equivalent to the LMS-Newton algorithm. In 
particular, when the filter input is stationary, in the steady state, the RLS recursion is 
approximately equivalent to the recursion 


W(n) = W(n — 1) + (1 —A(n))é,_, R’ x(n) (14.99) 


(see Eq. (12.90)). Note that here we have added the time index n to the forgetting factor 
A(n) to emphasize that it may vary with time. 

Starting with Eq. (14.99) and following the same line of derivations as those used 
to derive the VSLMS algorithm, the following update equation is obtained for adaptive 
adjustment of à (n): 

A(n) = A(n — 1) — pz(n) (14.100) 


where 
z(n) = (2é,_;(n)x(n))" (22, (n — 1)R7!x(n — 1)) (14.101) 


A more robust update equation for adaptation of A(n) is obtained by defining 


B(n) = 1—AMM) (14.102) 
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and noting that Eq. (14.100) can equivalently be written as 
P(n) = p(n = 1) + pz(n) (14.103) 


Moreover, from our experience with the VSLMS algorithm, we may suggest using mul- 
tiplicative increments instead of linear increments, and also replacing z(n) by its sign, to 
obtain the following recursion: 


B(n) = (1 + psign[z(n)]) Bm — 1) (14.104) 


To get a more easily usable expression for z(n), from Chapter 12, we recall that in the 
steady state 


Yy x ——_R 14.105 
a(n) EYS ( ) 
Rearranging Eq. (14.105) and replacing n by n — 1, we obtain 
1 
=1 -=i 
©% y 1 14.106 
fine (n= 1) ( ) 


Substituting Eq. (14.106) in Eq. (14.101) and using the definition (12.48), we obtain 
x(n) Wy! (n — 1)x(n — 1) 


z(n) = é,_1(M)e,_(0 1) 


1—A(n— 1) 
_ i xT(n)k(n — 1) 
Seaia = ary a (14.107) 


where k(n) is the gain vector of the RLS algorithm. Taking sign of z(n) and recalling 
that | — à (n) is positive, as A(n) < 1, we get 


sign[z(n)] = sign[é,_,(n)é,_.(n — 1)] x sign[x'(n)k(n — 1)] (14.108) 


Using the above results, Table 14.2 presents a summary of implementation of the RLS 
algorithm with variable forgetting factor — compare this with Table 12.2. Note that after 
every iteration, the forgetting factor, A(n), is checked and limited to some preselected 
values, At and A~. 


14.9 Summary 


In this chapter, we studied tracking behavior of various adaptive filtering algorithms in 
the context of a system modeling problem. We introduced and analyzed a generalized 
formulation of the LMS algorithm that could cover most of the algorithms that have been 
discussed in the previous chapters. The general conclusion derived from the analysis is 
that convergence and tracking are two different phenomena, and hence should be treated 
separately. We found that the algorithms that were previously introduced to speed up the 
convergence of adaptive filters do not necessarily have superior tracking behavior. We 
presented cases where the conventional LMS algorithm, which has the slowest conver- 
gence behavior, has better tracking behavior than those with more complicated structures, 
such as TDLMS or even RLS algorithm. 

We also considered the VSLMS algorithm (of Chapter 6) as an adaptive filtering scheme 
with optimum tracking. The optimal tracking behavior of the VSLMS algorithm was 
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Table 14.2 Summary of the RLS algorithm with variable forgetting factor. 


Input: Tap-weight vector estimate, w(n — 1), 
Input vector, x(n), Desired output, d(n), 
Forgetting factor, A(n — 1), Gain vector k(n — 1), 
and the matrix Wri (n —1). 
Output: Filter output, ĵ„_; (7), 
Tap-weight vector update, w(7), 
Forgetting factor A(m), and the updated matrix y7! (n). 


1. Computation of the gain vector: 


u(n) = Wy! (n — 1)x(n) 


k(n) = (n) 


u 
A(n — 1) + x" (n)u(n) 
2. Filtering: 
5,1) = W"(n — 1)x(n) 
3. Error estimation: 
é,-1(1) = d(n) — $ -1 0) 
4. Tap-weight vector adaptation: 


W(n) = W(n — 1) +k(n)ê, (n) 
5. Wy! (n) update: 

Y(n) = TH (AT (a — DO — 1) — ku" (n))} 
6. A(n) update: 


pn) = {1 + psign [2,_,(n)é,_»(n — 1)] x sign [X (k(n — DIAN — 1) 
a(n) = 1 — p(n) 
if A(n) >At, A(n)=At and B(n) =1-— At 
if A(n) <A, Atm) =A and Bin) =1—-A™ 


confirmed through computer simulations. The idea of the VSLMS algorithm was also 
extended to the RLS algorithm by introducing a similar adaptive mechanism for control- 
ling its forgetting factor (memory length). 

Most of the present literature on tracking behavior of adaptive filters and also our 
discussion in this chapter are limited to the case where the adaptive filter input is a 
stationary process. Only the desired output of the filter has been assumed to be non- 
stationary. We may thus say that the treatment of the problem of tracking in the present 


4 An exception is the work of Macchi and Bershad (1991) where they have compared the tracking performance of 
the LMS and RLS algorithms in recovering a chirped sinusoid. This is an example where the adaptive filter input 
is nonstationary. 
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literature is rather immature, and much more work need to be done on this very important 
topic. 


Problems 


P14.1 Inthe derivation of Eq. (14.19), we assumed that tr[uRKR] « tr[wRIJé,,. There, 
we remarked that on average this assumption becomes more accurate (valid) as 
the filter length, N, increases. In this problem, we examine the validity of the 
assumption for a few specific cases. 


(i) Show that in the case where R = I (I is the identity matrix) and w = pl, 
that is, a single scalar step-size parameter is used for all the taps, 
tr[uRKR 1 
MIRER] T (P14.1.1) 
tr[uR]Ex N 
Gii) Consider the case where R is diagonal and the elements of u are chosen 
according to Eq. (14.4). Show that in this case also Eq. (P14.1.1) holds. 
(iii) Consider the case where u = uI, but R and G are arbitrary correlation 
matrices. Use the decomposition R = QAQ! (Eq. (4.19) of Chapter 4), to 


show that Nai 
tr[@RKR] = u D0 Aki; 
i=0 
and 
N-1 N-1 
tr[MRIE., = H 2 n) (È si) 
i=0 i=0 


where A; and ki; are the ith diagonal elements of the matrices A and 
K’ = Q'KQ, respectively. Then, study the ratio 


tr[wRKR] 
tr MRI, 


and find how the distribution of A;’s and k‘,’s affect this ratio. 


P14.2 In the case of conventional LMS algorithm, that is, where w = uI with u being 
a scalar, show that 


(i) 


1 
box = poe, [R] + i) 


1 
1 — ptr[R] ( 
(ii) Assuming that utr[R] < 1 for the range of interest of u, show that the 
optimum value of u that minimizes &,, is 


en 1 tr[G] 
Ho Js, \ uR] 


(iii) Show that when u = u., the noise and lag misadjustments of the LMS 
algorithm are equal. 
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(iv) Show that the minimum value of &,, is given by 
E exo x o, v tr[R]tr[G] 
P14.3 A shortcoming of the result of the last problem is that when o,, is very small 
(i.e., the plant noise is very small), the calculated optimum step-size parameter, 
Ho, May become excessively large, resulting in an unstable LMS algorithm. 
Recalculate u, and exo without imposing the condition utr[R] < 1 and show 
that the results are as follows: 
/tr[G]/tr[R] 
Ho = 
/etG]eR] + J tr[G]tr{R] + 402 
and 
( /r1GTHTR] 4 (wlGieiR] + 402 ) JulGufR] 
bro = 5 
Simplify these results when the plant noise is zero. 
P14.4 Show that the solution of Eq. (14.27) is Eq. (14.28). 
P14.5 Give a detailed derivations of Eq. (14.29). 
P14.6 Give a detailed derivation of Eq. (14.80). 
P14.7 This problem aims at giving a derivation of the VSLMS algorithm for complex- 
valued case. 
(i) Show that Eq. (14.82) can also be written as 
p(n) = p;(n — 1) + p Rig; (ngi (n — 1)] (P14.7.1) 
where R[x] denotes real part of x. 
(ii) Following a similar line of derivation to those in Section 14.7.1, give a 
detailed derivation of Eq. (P14.7.1). 
P14.8 Consider the Markovian process, w(n), generated through the recursive equation 


w(n) = aw(n — 1) + v(n) 


where «œ is a parameter in the range of —1 to +1, and v(n) is a stationary white 
noise. 


(i) Define the sequence e(n) = w(n) — w(n — 1) and show that 


blk) a1 
Pe atl 


where @.,(k) is the autocorrelation function of e(n). 

(ii) Use the result of (i) to argue that the results presented in this chapter are 
also valid (within a good approximation) when the tap weights of the plant 
in the modeling problem of Figure 14.1 vary according to a Markovian 
model with a parameter a smaller, but close to 1. 


akio? 
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P14.9 


P14.10 


P14.11 
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Consider the LMS-Newton recursion 
w(n + 1) = wn) + 2uR7!e(n)x(n) 


when applied for tracking in the modeling problem of Section 14.1. 
(i) Show that 


=l 
Le (102 N+ -uiRG!) 


1— uN 


(ii) Let u, denote the optimum value of u, which results in minimum &,,. 
Assuming that the plant changes very slowly so that u, N < 1, show that 


y tr[RG] 


Ho = 
Oe, 


(iii) Obtain an approximate expression for &,, , that is, minimum value of £x, 
and compare that with your results in Part (iv) of Problem P14.2. 

(iv) Repeat Parts (ii) and (iii) for the case where the condition u,N « 1 does 
not hold. 


Suggest a variable step-size implementation of the LMS-Newton algorithm of 
Table 11.7. 


Give a detailed derivation of Eq. (14.101). 


Computer-Oriented Problems 


P14.12 


P14.13 


Consider a two-tap modeling problem with the parameters as specified in Case 3 
of Section 14.5. Develop a simulation program for implementation of the LMS 
and TDLMS algorithms in this case. By running your program confirm that the 
predictions made through the numerical results of Table 14.1 are consistent with 
simulations. That is, the choice of 0 = x /8 results in the least steady-state MSE, 
and 6 = x /4 results in the maximum steady-state MSE. 

Tip: To generate a random vector x(n) with a correlation matrix R, you may 
proceed as follows. First, find a square matrix L that satisfies R = L™L. You may 
then generate x(n) according to the equation x(n) = L™u(n), where u(n) is a 
random vector with the correlation matrix equal to the identity matrix. If you are 
using MATLAB, a convenient way of obtaining a square matrix L that satisfies 
R = LI'L is by using the function “chol” that finds Cholesky factorization of 
R. The same method can be used to generate a random vector €,(n) with a 
correlation matrix G. 


The MATLAB simulation program used to generate the results of 
Figures 14.3-14.5 is available on an accompanying website. It is called 
“vs_mdlg.m.” Experiment with this program and confirm the results of 
Figures 14.3—14.5. Try also other variations of the simulation parameters to 
gain a better understanding of the behavior of various implementations of the 
VSLMS algorithm. 
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P14.14 


P14.15 


Develop a simulation program to study the convergence behavior of the 
VSLMS algorithm in the channel modeling application that was discussed 
in Section 14.7.4, when a common step-size parameter is used for all taps. 
Compare your results with those of Figures 14.3—14.5, and discuss your 
observations. 


Develop a simulation program to study the convergence behavior of the RLS 
algorithm with variable forgetting factor (Table 14.2) in the channel modeling 
application that was discussed in Section 14.7.4. Compare your results with 
those of Figures 14.3—14.5 and also your results in Problem P14.14. Discuss 
your observations. 


15 


Echo Cancellation 


The presence of echoes in telephone line networks is an inevitable phenomenon that, 
unless removed, can be very annoying to the calling parties. Two types of echoes are 
encountered: (i) echoes arising from imbalanced hybrid bridges in the circuits, referred 
to as hybrid echoes (HEs), and (ii) echoes arising from acoustical echoes between loud- 
speakers/earphones and microphones, referred to as acoustic echoes (AEs). In Chapter 1, 
Section 1.6.4, a few introductory comments related to these two types of echo cancellers 
were presented. The purpose of this chapter is to dig into the details and discuss the var- 
ious adaptive filtering algorithms that have been proved useful in the implementation of 
echo cancellers. Moreover, the chapter addresses the problems of double-talk and positive 
feedback in echo loops, both of which can have detrimental effects on the operation of 
echo cancellers. In addition, a large part of this chapter is devoted to the issue of stereo- 
phonic AE cancellation, which has its own theoretical and practical peculiarities that need 
special treatment. 


15.1 The Problem Statement 


Figure 15.1 presents a schematic diagram in which the various sources of echo in a 
telephone line network are identified. Hybrid bridges/circuits that act as an interface 
between the bidirectional two-wire lines and the single direction four-wire lines are spread 
in central switching offices as well as at the subscriber premises. HEs will exist (to some 
degree) at all the hybrid bridges. Long duration echoes can occur when an echo happens 
at a hybrid bridge and later reflects back from another hybrid bridge at some distance. 
This can lead to echoes with durations of as much as a few hundreds of milliseconds. 
However, the more common echoes have durations of less than 30 ms; reflections from 
very far hybrids are likely to be significantly attenuated and thus are usually negligible. 

The acoustic link between the loudspeaker(s) and the microphone(s) in a teleconferenc- 
ing setup results in echoes whose duration (depending on the room size and the furniture 
type in the room as well as the walls material) can stretch over a few hundreds of 
milliseconds, that is, an order of magnitude larger than the HEs. 

We recall from Chapter 1 that echo cancellation may be cast as a modeling problem. 
However, there are a number of practical issues that make echo cancellation a nontrivial 
modeling problem. The long duration of the echo response translates into an adaptive 
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central switching offices 
subscriber premise and inter-office trunk lines 


A.E.: acoustic echo 
H.E.: hybrid echo 


Figure 15.1 Schematic diagram presenting various sources of echo in a telephone line network. 
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filter with a relatively large number of tap weights. For example, if we assume that the 
signals have been sampled at a (minimum) rate of 8 kilo samples per second, a HE with 
a time span of 30 ms requires an adaptive filter with 30 x 8 = 240 taps to be modeled. 
This may increase by an order of magnitude for modeling an AE in a typical conference 
room. Clearly, the implementation of an adaptive filter with a large number of taps 
incurs a significant cost in computational complexity. Hence, adaptation algorithms such 
as variations of the least-squares methods are too demanding and may be unacceptable 
for the implementation of echo cancellers, particularly for the implementation of AE 
cancellers. 

Another common problem of echo cancellers is that the underlying processes are speech 
signals. Speech signals are hard to deal with when applied as inputs to adaptive filters. 
Firstly, they are nonstationary processes with a very wide dynamic power range and a 
time-varying power spectral density. Secondly, their power spectral density over the band 
of interest varies significantly, and this clearly (Chapter 4) leads to a very wide eigenvalues 
spread of the correlation matrix of the input to the echo canceller (a transversal adaptive 
filter), and hence, can significantly affect its performance. 

On the other hand, the residual power of the output error in an echo canceller can 
vary significantly, owing to the presence of the so-called double-talk intervals. The term 
double-talk, as its name applies, refers to the cases where both parties on the two sides of 
the telephone line simultaneously speak. In that case, the residual error, ideally, is equal 
to the speech signals from the near-end speaker(s) only. This concept, in the context of 
an AE canceller, is depicted in Figure 15.2. The incoming signal, x(n), is the speech 


near-end conference room 


x(n) 
AEC 
y(n) 
d(n) = 
e(n) + 


Figure 15.2 Schematic diagram presenting an acoustic echo cancellation setup and the problem 
of double-talk. 
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signal from a far-end room. The desired signal, d(n), is the summation of the echo of the 
incoming signal, y(n), and the other sounds activities within the near-end room, e,(7). 
The goal is to remove the echoed copies of x(n), that is, y(n), from d(n) and transmit 
the residual signal e(n) = e,(n). Note that this reflects the remaining sound activities in 
the near-end room. Note that the notation e,(n) follows the notation of plant noise in a 
modeling problem, as appeared in many occasions throughout this book (Figure 6.5). 

A transversal adaptive filter is used to model the echo paths and generate a replica 
of the echoed incoming signal in the near-end room. Obviously, when the coefficients 
of the adaptive filter are set perfectly, the output error e(n) = e,(n). However, we note 
that perfect setting of the coefficients of the adaptive filter is not possible, because any 
adaptive filter is bound to suffer from some misadjustment (or, equivalently, misalignment 
of its coefficients). Such misadjustment/misalignment increases with the size (say, the 
mean-squared value) of the residual error e(n). In the absence of the double-talk, assuming 
that the background noise in the near-end room is small, e(n) will be small and, hence, 
the adaptive filter (echo canceller) will suffer from a small misadjustment/misalignment. 
On the other hand, as soon as a double-talk period starts, e(n) increases significantly and 
therefore, if the echo canceller adaptation continues (as before), its coefficients may be 
badly disturbed and hence e(n) will be contaminated with a significant level of residual 
echo. In other words, the presence of double-talk can badly disturb the operation of the 
echo canceller. To avoid this problem, a practical echo canceller should be equipped with 
a mechanism that identifies the presence of double-talk and takes a necessary action to 
guarantee a correct operation of the system. 

We note that although, here, we presented the problem of double-talk in the context 
of AE cancellers, the same problem exists in HE cancellers. Moreover, the solutions that 
will be discussed in Section 15.3 are applicable to both cases. 

In Figure 15.1, the loop consisting of the AE and the HE can be the source of another 
problem in a teleconferencing setup. When the loop gain at a particular frequency, say, 
Jo, is greater than 1 and the total phase shift within the loop is some factor of 27, the 
combination of AE and HE will make an unstable loop that begins oscillating with an 
increasing amplitude at the frequency fọ. This phenomenon, which also commonly occurs 
in any microphone-amplifier setup, is known as howling. Thus, AE cancellers should also 
be equipped with a howling suppression mechanism. The common methods that may be 
used for howling suppression are discussed in Section 15.4. 


15.2 Structures and Adaptive Algorithms 


The LMS (least-mean-square) algorithms, such as the normalized LMS and variable step- 
size LMS, unfortunately, may be too slow for adaptation of echo cancellers. This is 
because the input to an echo canceller is usually a speech signal that, as noted above, 
is nonstationary and highly colored. Throughout this book, we introduced a number of 
algorithms that were aimed for resolving the problem of slow convergence of LMS algo- 
rithm when it is subject to a colored/highly correlated input. In this section, we revisit 
these algorithms and provide some new comments that emphasize on their applications 
to echo cancellation. We also present numerical results that compare these algorithms in 
an AE cancellation setup. We exclude the recursive least-squares (RLS) and RLS lattice 
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algorithms in our study, because we believe their relatively high complexity makes them 
a weak choice in the application of interest in this chapter. 


15.2.1 Normalized LMS (NLMS) Algorithm 


Recall from Chapter 6, Section 6.6, that in the NLMS (normalized least-mean-square) 
algorithm the filter tap weights are adjusted using the recursive equation 


win + 1) = win) + e(n)x(n) (15.1) 


a 
x'(n)x(n) + w 
where jz is the algorithm step-size parameter, and y is a small constant that is added 


to avoid possible numerical instability of algorithm when x'(n)x(n) approaches 0. 
Also, here, 


e(n) = d(n) — ĵ(n) (15.2) 
and 
y(n) = w(n)x(n) (15.3) 
where 
wn) =[wo(n) wn) +» wy_y@)]" (15.4) 
and 
x(n) = [x(n) x(n—1)---x(n -N+1)]" (15.5) 


Although for most of the studies in the previous chapters we considered mean-squared 
error (MSE) as a good measure of convergence of the various adaptive algorithms, one 
may find that an alternative measure of convergence that may prove useful, particularly 
in the case of AE cancellers, is the misalignment of the adaptive tap weights from their 
optimum values, defined as 


t(n) = Ellyw’ (15.6) 


where ||v(7)||? = vT (n)v(n), v(n) = w(n) — Wo, and w, is the optimum tap-weight vector 

of the filter that we strive to find. It turns out that finding an expression for the optimum 

value of A that results in the fastest convergence of ¢(n) is rather straightforward. 
Subtracting w, from both sides of Eq. (15.1) and letting y = 0, we obtain 


vin + 1) = v(m) + e(n)x(n) (15.7) 


= 

x! (n)x(n) 

Assuming that x(n) and v(m) are known, premultiplying each side of Eq. (15.7) by its 

transpose and taking expectation, we obtain 

2 E[e?(n)] . Ele(n)v" (n)x(n)] 
x! (n)x(n) x!(n)x(n) 


Ellly + DII = Ivo? + (15.8) 


Note that because we assume x(n) and v(n) are known, the expectation on the right-hand 
side is applied only to the terms containing the output error e(n). We also note that in an 
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echo canceller, e(n) is the signal sent to the far-end room. Moreover, it may be written as 


e(n) = d(n) — $(n) 
= e,(n) + wix(n) — w" (n)x(n) 
= e,(n) — v'(n)x(n) (15.9) 


where e (n) is the signal corresponding to the sound activities in the near-end room, 
excluding the far-end signal that is broadcast in the room through the loudspeaker. We 
also note that v'(n)x(n) is an undesirable residual echo. Assuming that e (n) has a zero 
mean, and defining 

r(n) = v'(n)x(n) (15.10) 


Equation (15.8) reduces to 


2 Efe?(n)] _ r(n) 


Elliv@e + DIT = IOI? + Bre ay TOA 


(15.11) 


The goal is to find the value of ñ that minimizes E[||v(n + 1)||*]. This, obviously, is 
obtained by solving the equation 


dElllvin + DIP] L 


a 0 (15.12) 
This leads to 
- r°(n) 
Poo) = Ee] 
r(n) 
~ Elez(n)] + r2(n) ai 


where the second identity is obtained using Eq. (15.80) and recalling that e,(m) and 
r(n) = v'(n)x(n) are uncorrelated. 

It is important to make some comments on the significance of the optimum step-size 
parameter (15.13). 


e When E [e2 (n)] = 0, that is, when e, (n) = 0 one finds that Dop) = 1. In other words, 
if one thinks of the echo canceller as a modeling problem just as the one in Figure 6.5, 
when there is no noise at the output of the plant, Bop) = 1 leads to an optimum 
performance of the NLMS algorithm. 

The optimum step-size parameter decreases as e,(n) increases in power. 

e At the beginning of adaptation, when w(n) = 0, r? (n) is generally large and thus, unless 
E [e2(n)] is also large, Hop (1) will be approximately equal to 1. 

As the algorithm iterates, r? (n) reduces, on average, and thus, in the presence of some 
nonzero E [e2 (n)], opt (n) decreases. This, clearly, is a sound strategy for reducing the 
filter misadjustment and/or misalignment. 

In case w, changes, r?(n) increases and, hence, Hop (1) will increase to track the change 
of w,- 
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Despite its appealing form, unfortunately, the implementation of Eq. (15.13) is not 
straightforward. This is because the quantities r?(n) and E [e2(n)] are not available. In 
order to resolve this problem and still come up with a solution that follows the same 
concept, we replace r?(n) in Eq. (15.13) by its expected value, E [r?(n)]. Moreover, 
assuming that e,(m) and r(n) are uncorrelated processes, one finds that 


E[e*(n]) = El(e,(n) — r(n))*] = Eleg(n)] + Elr?(n)] (15.14) 


Considering the above point, we introduce 


[Lop (1) lad (15.15) 
n) + — : 
fow? Ele(n)] 
This result may be further rearranged as 
: Efe2(n)] 
x | - ——— 15.16 


Next, we recall that the signal at the microphone output is d(n) = e,(n) + y(n), where 
y(n), as defined before, is the echo signal from the loudspeaker. Assuming that e, (n) and 
y(n) are a pair of uncorrelated processes, and the echo canceller has converged, thus, the 
estimate }(n) © y(n), we obtain 


Ele2(n)] © Eld?(n)] — E[S?(n)] (15.17) 
Substituting Eq. (15.17) in Eq. (15.16), we get 


E[d?(n)] — E[3?(n)] 
E[e2(n)] 


flop (n) © 1 (15.18) 


Unfortunately, the approximation (15.18) works only when the echo canceller has con- 
verged and, thus, y(n) © y(n). It does not work, or may become an adverse algorithm 
when it has not converged and y(n) and }(n) are significantly different. In particular, 
when the echo canceller starts with the initial tap weights 0, p(n) = 0, e(n) = d(n), and 
consequently E [d?(n)] = E[e*(n)]. This in turn implies Lop (1) given by Eq. (15.18) is 
0; hence, the adaptation will install and the echo canceller never converges. To resolve 
this problem, we suggest using the following variable step-size parameter 


E[S?(n)] 
1 Eld] 


where 0 < ņ < 1 is a design parameter. This, although may not be considered as an 
optimum choice, has the following appealing properties. 


e At the start of the algorithm, when ĵ(n) = 0 (or approximately equal to 0), A(n) = 1 
will result in the fastest convergence of the NLMS algorithm. This is in line with the 
comments made above for the optimum step-size parameter given by Eq. (15.13). 
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As the algorithm converges and }(n) approaches y(n), assuming that e,(7) is small, 
d(n) © $(n), hence, E[}?(n)] ~ E[d?(n)] and therefore (n) approaches 1 — n. This 
will allow the algorithm to converge to a misadjustment proportional to 1 — 7. Hence, 
the design parameter 7 can be chosen to control the misadjustment of the algorithm 
as desired. 


The estimates of E[d?(n)] and E[$?(n)] can be obtained by taking the short-term 
averages of the most recent samples of d(n) and ¥(n), respectively. For instance, the 
short-term averages may be calculated using the following recursive equations. 


oi(n) = aoj(n — 1) + (1 —a@)d?(n) (15.20) 
o$ (n) = «o$ (n — 1) + (1 —@) S$?) (15.21) 
Accordingly, we obtain 
3 o5(n) 
p(n) xi- T (15.22) 


Owing to the errors in the estimates og (n) and o? (n), Hop (2) may become negative 


(when og (n) < no? (n)) and thus result in a divergence of the algorithm. To resolve this 
problem, one may choose to set A(n) equal to 0, if Eq. (15.22) leads to a negative (n). 
Table 15.1 summarizes the NLMS algorithm that was introduced in this chapter. We 
refer to this as variable step-size normalized least-mean-square (VSNLMS) algorithm, to 
differentiate it from its original version that was presented in Chapter 6, Section 6.6. 


15.2.2 Affine Projection LMS (APLMS) Algorithm 


In Chapter 6, we introduced the APLMS (affine projection least-mean-square) algorithm 
as a generalization to the NLMS algorithm. We also noted that the APLMS algorithm may 
be thought as an approximation to the LMS-Newton algorithm, and hence, has a superior 
convergence behavior over the NLMS algorithm. However, an APLMS algorithm may 
suffer from a larger misadjustment/misalignment than its NLMS counterpart. Irrespective 
of this potential problem, a number of researchers have considered the APLMS algo- 
rithm a better choice than the NLMS algorithm in the implementation of echo cancellers. 
Numerical examples that substantiate this claim are presented in Section 15.2.6. 

To bring the APLMS algorithm on par with its VSNLMS counterpart, we also develop 
a variable step-size version of it. To this end, we recall the APLMS recursion (6.133), let 
w = 0, and note that using the definition v(n) = w(n) — W, we obtain 


v(n + 1) = v(n) + XM XTX m) eln) (15.23) 
where, as defined in Chapter 6, 
X(n) = [x(n) x(n — 1)---x(n — M + 1)] (15.24) 
e(n) = d(n) — XT (n)w(n) (15.25) 
and 
d(n) = [d(n) d(n—1)---dn-M+1)]" (15.26) 


Recall that M is an integer parameter and usually M < N. 
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Table 15.1 Summary of the variable step-size normalized LMS (VSNLMS) algorithm 
for echo cancellation. 


Input: Tap-weight vector, w(n), 

input vector, x(n), desired output, d(n), 

and power estimates og(n — 1), o?(n — 1), and o2(n —1) 
Output:  Tap-weight vector update, w(n + 1), 

power estimates og (n), o3(n), and a2 (n) 

The echo canceller output is e(n). 


1. Filtering: 
$n) = w"(n)x(n) 
2. Error estimation: 
e(n) = d(n) — y(n) 
3. Step-size calculation: 
ogn) = aci (n 1)+ (1 —a)d?(n) 
o$ (n) = «os (n 1) + (1 — œ)? (n) 


og (n) = «oè (n 1) + (1 —a)e?(n) 


ogn) — o$ (n) 


An)=1 F 


if A(n) > 1, let &(n) = 1 
if (n) < 0, let A(n) = 0 


4. Tap-weight vector adaptation: 


wn + 1) = w(n) 4 aex 


Following the same lines of argument that led to the derivation of Eq. (15.11) from 
Eq. (15.8), we obtain from Eq. (15.23) 


Ef|lv(n + DIP = IV? + WELT MATX) en] 
—2ur!(n)(X'(n)X(n))7!r(n) (15.27) 


where r(n) = X"™(n)v(n) is the vector of residual error, an extension of Eq. (15.10). 
Now solving Eq. (15.12), for the present case, we obtain 
r (MATX rn) 
Efe? (n) (XT n)Xn))tem)] 


Mop (1) = (15.28) 
This is somewhat similar to the first line in Eq. (15.13); however, it has a more complex 
form because of the presence of the matrix (X'(n)X(n))~!. Nevertheless, the comments 
made following Eq. (15.13) are also applicable here. Noting this and the fact that a direct 
implementation of Eq. (15.28) is not possible in practice, we propose to use Eq. (15.22) for 
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Table 15.2 Summary of the variable step-size APLMS (VSAPLMS) algorithm for echo 
cancellation. 


Input: Tap-weight vector, w(n), 

input vector, x(n), Desired output, d(n), 

and power estimates og (n— 1), o? (n — 1), and o2(n — 1) 
Output: Tap-weight vector update, w(n + 1), 

power estimates og (n), o3(n), and a2 (n) 

The echo canceller output is e(n). 


1. Filtering: 

$n) = X"(n)v(n), $(n) = the first element of ¥(n) 
2. Error estimation: 

e(n) = d(n) — y(n), e(n) = the first element of e(n) 
3. Step-size calculation: 

ogn) = ao} (n — 1) + (1 —a)d?(n) 


o? (n) = ao? (n 1) + (1 — æ)” (n) 


a2(n) = «o? (n 1) + (1 —a)e?(n) 


oin) — o$ (n) 
o2(n) 

if A(n) > 1, let &(n) = 1 

if (n) < 0, let A(n) = 0 


f(n) = 1 


4. Tap-weight vector adaptation: 


win + 1) = wn) + f(a) X(n) (XT) X(n) + YD teln) 


the APLMS algorithm as well and thus summarize the variable step-size affine projection 
least mean square (VSAPLMS) algorithm as in Table 15.2. 


15.2.3 Frequency Domain Block LMS Algorithm 


The frequency domain/fast block least mean square (FBLMS) algorithm was introduced 
in Chapter 8. It was noted that the step-normalization at different frequency bins will 
resolve the problem of eigenvalue spread and thus results in a fast converging algorithm 
with a single mode of convergence. We also noted that for very long adaptive filters, such 
as those of interest in this chapter, the FBLMS algorithm may introduce an unacceptably 
long latency at the output; that is, the output samples are generated with a significant 
delay. The partitioned frequency domain/fast block least mean square (PFBLMS) was 
thus proposed as a method of reducing the latency, while no noticeable degradation in 
performance could be observed. 

Here, to cope with the nonstationary nature of the speech signals, we consider a ver- 
sion of the PFBLMS algorithm that follows the NLMS algorithm formulation for the 
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step-normalization at each frequency bin. We also adopt the method that was developed 
for the NLMS to control the step-size parameter of the PFBLMS algorithm. A sum- 
mary of this version of PFBLMS algorithm, which we call variable step-size normalized 
partitioned frequency domain/fast block least mean square (VSNPFBLMS) algorithm, is 
presented in Table 15.3. 


15.2.4 Subband LMS Algorithm 


The subband LMS algorithm is another form of the LMS algorithm that has been proposed 
as an effective method of implementing high performance AE cancellers. As discussed 
in Chapter 9, to resolve and equalize the various modes of convergence of the LMS 
algorithm, the input signal, x(n), and the desired output signal, d(n), are partitioned 
into a number of narrowband signals. The narrowband partitions from x(n) and d(n) are 
matched using a set of subband adaptive filters. The output of these filters are combined 
through a synthesis filter bank to construct a full-band (filtered) signal that matches d(n). 

We also recall from Chapter 9 that there are two possible structures for the sub- 
band adaptive filters: the synthesis-independent structure, Figure 9.7, and the synthesis- 
dependent structure, Figure 9.8. The latter may provide a more accurate output; however, 
the delay in the adaption loop results in a less stable and also a slower adaptation algorithm 
(see the discussion in Section 9.3). Because of these problems, the synthesis-dependent 
subband adaptive filters are less popular. For echo cancellers also, we limit our attention 
to the synthesis-independent structure. 

We also note that in the case of echo cancellers, the output signal of interest is the error 
signal e(n). Noting this, the synthesis-independent structure of Figure 9.7 is modified as 
in Figure 15.3. The subband AE cancellation results that are presented in Section 15.2.6 
are those of this structure. The tap weights of the subband adaptive filters Wo(z) through 
W y_1(2) are adjusted using the NLMS algorithm using a common, and possibly time- 
varying, step-size parameter A(n). 


15.2.5 LMS-Newton Algorithm 


The last modification to the LMS algorithm that was presented in the previous chapters was 
an approximation to the LMS-Newton algorithm based on autoregressive modeling of the 
input signal x(n). In Section 11.15, two version of this approximation were presented: 
Algorithm 1 and Algorithm 2. We also noted that although Algorithm 1 was a better 
approximation to the LMS-Newton algorithm, Algorithm 2 was more appealing because 
of its simple structure. 

We note that the use of the approximate LMS-Newton algorithms of Section 11.15 in 
the application of echo cancellers is very appealing. Firstly, the vast studies of speech 
signals in the past have proved the fact that autoregressive modeling is an excellent fit to 
speech signals. Moreover, a model order of 8 to 12 (typically, 10) is commonly used for 
speech signal in the celebrated linear predictive coding systems. Secondly, the number 
of taps in typical AE cancellers is in excess of 1000. These imply that linear modeling 
part constitute a negligible portion of the complexity of the approximate LMS-Newton 
algorithms. Hence, the proposed LMS-Newton algorithms have a complexity that is only 
marginally higher than that of the conventional LMS/NLMS algorithm. 
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Table 15.3 Summary of the variable step-size normalized PFBLMS (VSNPFBLMS) algorithm 
for echo cancellation. 


Input: Tap-weight vectors, we (k), l= 0,1;.25,P'=—1, 


extended input vector, 

X,(k) = [x(kKL — M) x(kL— M +1)---x(KL+L—1)]', 

the past frequency domain vectors of input, Xr alk —1), for l =1,2,...,(P—I)p, 
and the desired output vector, 

d(k) = [d (kL) d (kL + 1)---d(kKL+ L— J". 


Output: Tap-weight vector update, wr )(k + 1),/=0,1,...,P — 1, 


power estimates oi (k), o? (k), and o2 (k) 
The echo canceller output is e(n). 


Filtering: 
Xz o(k) = FFT(Š,(k)) 
§(k) = the last L elements of IFFT (or o wyk) Oxpolk-— pl)) 
Error estimation: 
e(k) = d(k) — y(k) 
Step-size calculation: 
og (k) = aog (k — 1) + 1 — æ) (d™(k)d(k)) 
of (k) = aa3(k — 1) + (1 — AHTO) 
o2(k) = ao2(k — 1) + (1 — œ) (e! (n)e(k)) 
: oi (k) — o? (k) 
E o2(k) 
if (i(k) > 1, let i(k) = 1 
if (i(k) < 0, let i(k) =0 
fori = 0 to M’-1 
u6 = Bk) Eio *F olk- pl) 
wk) = [uol(k) ni(k) uy (QT 
Tap-weight adaptation: 


0 
TIRA) 


for /=0to P—1 
we (k + 1) = wzlk)+2u(k) © Xz olk — pl) Oer(k) 
Tap-weight constraint: 
for /=0to P—1 


We (k + 1) = FFT ii ee E ET OV pee 1) 


Notes: 


M: partition length; L: block length; M’ = M + L. 

0 denotes column zero vectors with appropriate length to extend vectors to the length of M’. 
© denotes elementwise multiplication of vectors. 

Step 5 is applicable only for the constrained PFBLMS algorithm. 
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analysis >| Wı(z) >D >| synthesis 
filter bank filter bank 
y 
Tm-ı(k Jm—ı(k 
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Figure 15.3 Schematic diagram of a subband synthesis-independent echo canceller. 


Here, because of its simple structure, we limit our discussion to Algorithm 2 of 
Section 11.15. Also, to take care of the nonstationary nature of the speech signals, we 
develop a normalized version of the algorithm. Referring back to Chapter 11, Table 11.7, 
we recall that the tap-weight update equation 


wn + 1) = w(n) + 2ue(n)u, (n) (15.29) 


where as discussed in Section 11.15, u, (7) is an approximation to R-!x(n — M). Follow- 
ing the same line of derivation as those that led to Eq. (6.119), we begin with defining 
the a posteriori error 

et (n) = d(n) — w™(n + 1)x(n — M) (15.30) 


Substituting Eq. (15.29) in Eq. (15.30), replacing u by u(n), and rearranging, we obtain 
et(n)=(1— 2u(n)ul (n)x(n — M))e(n) (15.31) 


Next, as in the case of the NLMS algorithm, minimizing (e+(n))* with respect to u(n) 
results in 1 

= 15.32 

MoS Gist) ee 


This forces e*(n) to 0. Substituting Eq. (15.32) in Eq. (15.29), we obtain 


win + 1) = w(n) + e(n)u,(n) (15.33) 


ul (n)x(n — M) 
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This is the dual of the NLMS algorithm of Eq. (6.107). Following the same argument as 
in Section 6.6, one may replace Eq. (15.33) by its regularized form 


u 
ul (n)x(n — M) + Y 


where jz is a step-size parameter, taking values in the range of greater than 0 and less 
than or equal to 1, and y is a small positive constant that avoids the algorithm numerical 
instability when ul! (n)x(n — M) approaches 0. 

We note that considering the identity ul (n)x(n —M)=x'(n—M Ju, (n) and using the 
approximation u,(n) ~ R~!x(n — M), we obtain 


w(n + 1) = win) + e(n)u,(n) (15.34) 


ul (n)x(n — M) ~ x" (n — M)R™!x(n — M) (15.35) 


Since R and, hence, R7! is a positive definite matrix, ul (n)x(n — M) is most likely a 
positive number. However, given that Eq. (15.35) is an approximation, still there is a low 
chance that ul (n)x(n — M) be a negative number and thus the algorithm takes a step 
in a wrong direction. To avoid this problem, one may choose not to update w(n) when 
ul (n)x(n — M) is negative. Also, one may choose to adjust u(n) as in the NUMS/APLMS 
algorithm. 

Considering the above changes/modifications, the variable step-size NLMS-Newton 
algorithm that we suggest for echo cancellation is listed in Table 15.4. 


15.2.6 Numerical Results 


In this section, we present some numerical results to evaluate and compare the relative 
performance of the various echo cancellation algorithms that were presented in the previ- 
ous sections. The experiments are for the AE cancellation setup of Figure 15.2. Although 
our study will concentrate on an AE canceller setup, the conclusions derived are equally 
applicable to HE cancellers as well. 

As the far-end signal x(n), we have selected eight pieces of speech signals from the 
National Public Radio (NPR) in United States, and from the British Broadcasting Cor- 
poration (BBC) Radio in United Kingdom. Male and female speakers are selected to 
make sure that there are some diversity in the speech samples. These sampled signals 
are available on the accompanying website, as MATLAB .mat files. They are named 
“speechi.mat” through “speech8.mat.” The sampling rate is set at 8820 Hz. For 
the results presented in this section, we have used “speech1 .mat.” However, the reader 
may repeat the experiments with other speech signals. The MATLAB codes for all the 
echo cancellers algorithms are available on the accompanying website. 

The near-end room echo response is generated randomly by first choosing the random 
column vector g consisting of the elements 


g,=(n +N) nan), n=0,1,...,N—1 (15.36) 


where n(n) is a Gaussian random sequence with unit variance and independent samples. 
The vector g is then normalized to obtain the desired echo response vector 
h=- 


(15.37) 
g'g 
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Table 15.4 A variable step-size NLMS-Newton algorithm for echo cancellers. 


Given: Parameter vectors k(n) = [ki (n) k(n) -+K ua" 
and w(n) = [wo(n)w,(n)--- wy_1@)]', 
data vectors x(n), b(n — 1), and u,(n — 1), desired output d(n), 
and power estimates Py(n — 1), P(n — 1),..., Py(n — 1) 
Required: Vector updates x(n), w(n + 1), b(n), and u, (n), 
and power estimate updates P(n), P,(1),..., Puyn) 
The echo canceller output is e(n). 


xxx Lattice Predictor Part xxx 


fo) = baln) = x(n) 
P(n) = P(n — 1) + 0.50 — DLJE) + b(n — 1)] 
form = 1 to M 
fin) = fin—1M) g Km (1)Dy_1 (0 = 1) 
bn (n) = bygn = 1) T Ka) fn-1 0) 
Mae iat n 
P, m) +e m—1 VU), 1 m—1 Ut m 
P,,(n) = BP,,(n — 1) +0.50 — DLE) +b- D] 
if kp A)| > Y, Kpn) = k,,(n — 1) 
end 


Km (n + 1) = Kp (0) 4 


xxx U, (1) update xxx 


u,(n— j)=u,a—j+1), forj=N—-1,N—2,...,2 
fi) = bon) = by (n) 
form=1toM-—1 
mO) = fn) — Ky Mbr n — 1) 
bi, (1) = bn- B 1) ~ Km (1) fp- 0) 
end 
u,(n) = (Pyn) + tf, iM —Ky(n)by_,(— 1)) 


xxx Filtering and tap-weight vector adaptation xxx 


1. Filtering: 
$n) = w"(n)x(n — M) 
2. Error estimation: 
e(n) = d(n — M) — S(n) 
3. Step-size calculation: of (n) = aoj(n — 1) + (1 — a)d?(n) 
o? (n) = ao? (n — 1) + (1 — a)? (n) 
o2(n) = ao2(n HN+ada a)e?(n) 
in) <1 - BOBO 
o2(n) 
if (n) > 1, let (n) = 1 
if (n) < 0, let A(n) = 0 
4. Tap-weight vector adaptation: 


A(n) 
ul (n)x(n —-M)+yv¥ 
(ignore this step if ul (n)x(n — M) <0) 


w(n + 1) = win) 4 e(n)u, (n) 
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0) 200 400 600 800 1000 
Figure 15.4 A sample example of the echo response h,,. 


For the results presented in this section, we have chosen Nọ = 150 and N = 1024. The 
echoed signal is thus given by 
y(n) =h,, * x(n) (15.38) 


where h,,’s are the elements of h and x denotes convolution. A sample example of the 
echo response h,, is presented in Figure 15.4. We have used this response for generation 
of all the results in this section. 

We use the echo return loss enhancement (ERLE), defined as 


Ae 


ERLE = 10 logy | 
to evaluate the performance of the various adaptive algorithms. Recall that r (n) = y(n) — 
(n) is the residual echo within the error/transmit signal e(n). Ideally, the ERLE should 
approach infinity as the estimated echo ĵ(n) approaches the true echo y(n). However, the 
nonzero value of e(n), in the steady state, leads to a perturbation of the echo canceller 
tap weights around their steady-state value, and hence, does not allow the ERLE to grow 
indefinitely. Recall that in the context of adaptive filters theory, the perturbation of the 
tap weights in the steady state is measured by misadjustment, M, which is defined as the 
ratio of excess MSE over the minimum mean-squared error (MMSE). Furthermore, we 
note that the presence of a double-talk may increase the MMSE significantly; thus, in the 
presence of a double-talk, the perturbation of the echo canceller tap weights can be very 
large. This, in turn, may reduce the ERLE significantly. 

Figure 15.5 presents a set of ERLE plots that compare the various adaptive algo- 
rithms. These results have been collected using the MATLAB codes available on the 
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Figure 15.5 Echo return loss enhancement (ERLE) of the various algorithms: (a) NLMS, 
(b) APLMS, (c) FBLMS, (d) SBLMS, and (e) LMS-Newton. 
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Figure 15.5 (continued) 
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Figure 15.5 (continued) 


Table 15.5 The parameters of the various algorithms that have been used for the results 
presented in Figure 15.5. 


Algorithm hw N MI Ly a, K, KN, No Bo Y € Wo 
NLMS 0.5 0.1 1024 

APLMS 0.5 0.1 1024 5 

FBLMS 0.5 0.1 1024 64 64 

SBLMS 0.5 0.1 1024 32 19 - 3/16 7/19 5 3 191 193 - - - - 
LMS-Newton 0.5 0.1 1024 8 0.98 0.9 0.05 0.001 


accompanying website (www.wiley.com/go/adaptive_filters). For the results presented in 
Figure 15.5, the far-end signal is “speech1.mat.” Here, we have kept the variable step- 
size part of the algorithms inactive; that is, we have used a fixed step-size parameter f. 
The parameters of the various algorithms that have been used for the results presented in 
Figure 15.5 are listed in Table 15.5. There is no near-end (double-talk) signal. However, 
a background Gaussian noise with a standard deviation of o, = 0.001 has been added at 
the microphone. This is about 45 dB below the near-end echoed signal that reaches the 
microphone. Also, as an alternative way of observing the performance of the various algo- 
rithms, the error (transmit) signal e(n) produced by the examined algorithms, along with 
the microphone signal (i.e., before the echo cancellation), are presented in Figure 15.6. 
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Figure 15.6 Signal wave forms (a) before cancellation, and after cancellation: (b) NLMS, 
(c) APLMS, (d) FBLMS, (e) SBLMS, and (f) LMS-Newton. Note that the vertical scale (amplitude) 
in (a) is different from the rest. 
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From the results presented in Figures 15.5 and 15.6, and further tests that have been 
performed with other samples of speech signals as well as other choices of the parameters 
of the various algorithms, the following observations have been made. 


The APLMS has the fastest rate of convergence. This may be understood, if we recall 
that the APLMS algorithm lies between the conventional LMS algorithm and the RLS 
algorithm. Its performance approaches that of the RLS algorithm, as the parameter M 
approaches the filter length N. However, because of the reasons discussed in Chapter 6, 
the APLMS algorithm suffers from a relatively high misadjustment. This drawback 
of the APLMS algorithm is reflected in its ERLE, which is lower than those of the 
NLMS, LMS-Newton, and FBLMS algorithm. 

e The LMS-Newton algorithm, on the other hand, although has similar or better con- 
vergence compared to the rest of the algorithms, achieves the highest level of ERLE. 
Moreover, a subjective test that may be performed by listening to the error signal 
e(n) confirms the fact that the LMS-Newton has a superior performance over all the 
algorithms that are tested here. 

The subband-LMS algorithm appears to be the most inferior adaptation algorithm in 
terms of the achievable ERLE. It has given an ERLE that is about 5 to 10 dB below the 
rest of the algorithms. This excessive loss in performance is due to imperfect filters that 
have been used for subband analysis and synthesis. The performance of the algorithm 
could be improved by using better filters; however, such improvement will be at a cost 
of an excessive latency in the echo canceller output that may be unacceptable. 

The presence of some slow modes of convergence in the NLMS algorithm is 
very evident. 


15.3 Double-Talk Detection 


Double-talk detection algorithms fall under the general category of correlation/ 
coherence-based schemes. In this section, we present the general idea behind this class 
of algorithms and provide the details of an implementation of them. This presentation 
follows the results that were first reported by in Gänsler et al. (1996). The work 
presented in Benesty et al. (2000) gives a different point of view of the same concept. 
The additional study performed here reveals that a practically appealing implementation 
of the correlation/coherence-based schemes lead to a double-talk detection strategy that 
is closely related to the variable step-size NLMS that was developed in Section 15.2.1. 


15.3.1 Coherence Function 


The coherence function between a pair of random processes u(n) and v(n) is defined as 


i ® el” 2 
Cy, (e7?) = | vul )| - (15.40) 
Pu (21°) P (e72) 
where ®,,,(e/”), ®,,,(e/”), and ®,,,(e/”) are the power and cross-power spectral density 


functions defined in Chapter 2. 
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When u(n) and v(n) are related through a linear time invariant system (e.g., v(m) is 
the output of a system with input u(n) and transfer function H(e/”)), using the results 
presented in Chapter 2, one will find that 

P u (e12) = ®,,,(e/°) H(e!”) (15.41) 


uu 


and 


D (ei?) = ©, (e/®)|H (e!”)/? (15.42) 


Substituting Eqs. (15.41) and (15.42) in Eq. (15.40) and simplifying the result, we will 
find that 
Cy, (7°) = 1 (15.43) 


This identity may be thought as a way of reflecting the fact that u(n) an v(n) are coherently 
related through a linear time invariant transfer function. In that case, we say there is a 
full coherence between u(n) and v(n). 


15.3.2 Double-Talk Detection Using the Coherence Function 


Consider the echo cancellation setup of Figure 15.2. The microphone signal d(n) consists 
of two components: (i) the echo signal y(n), obtained from passing the far-end signal x(n) 
through the near-end room acoustic response (a linear system), and (ii) the summation of 
noise and double-talk, lumped together as e,(n). In the absence of double-talk, e,(m) may 
be negligible and thus one may argue that there will be an almost full coherence between 
d(n) and x(n). Hence, 


Cay (el?) ~ 1, in the absence of double-talk (15.44) 


In the presence of double-talk, ®,,(e/) increases and, thus, c4, (ef®) may drop signifi- 
cantly below 1. 

In Gansler et al. (1996), the authors have noted that a high-performance spectrum 
analysis technique should be used to evaluate c,,(e/”) at a number of discrete frequencies 
and accordingly form a decision variable 


=í 
D = 7 5 Ey, (e72) (15.45) 
i=0 
where ¢,, (e/”:) is the estimate of c}, (e/”) at w = œ;, and œj, for 0 < i < L, are the set 
of discrete frequencies. If D, is greater than a certain threshold, it is assumed that there 
is no double-talk; otherwise, it is assumed that a double-talk exists. 


15.3.3 Numerical Evaluation of the Coherence Function 


In Section 3.6.2, as a side discussion, we presented a numerical method for calculating 
the power spectral density of a process and its cross-power spectral density with another 
process. This was done through a filtering step that extracts narrowband portions of the 
underlying processes in the vicinity of w = w,. The extracted signal portions are then auto 
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and/or cross-correlated to compute the desired power and cross-power spectral densities 
(Figure 3.7). These steps, when combined with an additional processing step to obtain a 
sample value of ¢,,(e/) at œ = @;, are presented in Figure 15.7. The first blocks on the 
left-hand side of the figure are narrowband filters centered around w = œw;. The blocks 
E[-] denoted the expected values. However, in practice, it is assumed that the involved 
processes are ergodic and thus the statistical averages are replaced by time averages. The 
combiner box takes ®,,(e/”'), ®,, (e721), and ®,, (e/”') as inputs and calculates 


(Ða (7)? 
P (ci) O 1, (ei) 


we) = (15.46) 


An effective implementation of Eq. (15.46) requires averaging over sufficient samples 
of |d; (n)|?, d;(n)x;(n), and |x; (n)|?, through the E[-] boxes. Also, for the averages to be 
effective, the samples from each of the processes |d; (n)|’, d;(n)x}(n), and |x; (n)|* should 
be independent of one another. Moreover, for an echo canceller to react fast enough to the 
appearance of a double talk, the signal samples d;(n) and x;(n) should be obtained from a 
minimum number of samples of d(n) and x(n), respectively. This is a classical spectrum 
analysis problem for which the best solution is the so-called multitaper spectral estimator 
of Thomson (1982). Here, in order to avoid diverting too much into the theory of the 
multitaper method, we only briefly discuss the basic ideas behind it and to provide some 
guidelines present a few numerical results. The derivation and details of the multitaper 
method are deferred to Appendix 15A at the end of this chapter. 

The multitaper method is effectively a filter-bank-based signal analysis tool. Figure 15.8 
presents the block diagram of a filter-bank-based spectral estimator for a signal x(n). The 
filters Ho(e/), H,(e/®), ..., H,_,(e/) make a bank of filters that decomposes the input 
signal x(n) into L narrowband signals xg(n), x(n), ..., X,_(); see Figure 15.9. For 
i =0,1,..., L — 1, the respective power estimator takes a time-average of |x; (n)|*, as an 
estimate of the signal power over the ith band of the filter bank. Normalizing the power 
estimates with respect to the width of the bands results in estimates of the power spectral 
density of x(n), ®,,(e/”), for i =0,1,...,L—1. 

The filter bank structure of Figure 15.8 consists of L parallel filters that may be centered 
at frequencies w; = a i =0,1,..., L — 1. This choice of the frequencies œw; will allow 
an efficient implementation of the L filters in a polyphase structure. The polyphase struc- 
ture, which is also introduced in Appendix 15A, has a complexity which is that of one of 
the L filters and one fast Fourier transform (FFT) of size L. The polyphase structure is 
developed based on the fact that the set of filters Hy(e/”), H,(e/®), ..., Hp_,(e/®) all 
originate from the same filter, call prototype filter. Here, the prototype filter is Hy(e/®), 
and the rest of the filters are obtained from this filter through a modulation (i.e., a shift 
of the filter frequency response across the frequency axis). 

Another feature of the multitaper method, from which the name multitaper has orig- 
inated from, is that for each instant of time, n, a set of samples of x;(n), for each i, 
are generated based on a set of prototype filters. To differentiate these prototype filters 
and their respective subband signals, we refer to them as HË el”) and x(n), for 
k= 1,2,..., K, where K is the number of prototype filters. Hence, the power averages 
are made across time, at the time instants n,, n3, ..., n p, the samples of the power spectral 
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Figure 15.7 Procedure for calculation the coherence function ¢4, (e/”) at œ = @;. 
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Figure 15.8 Filter-bank-based spectrum analyzer. 
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Figure 15.9 Presentation of the filters in a filter bank spectrum analyzer. 


density of x(n) is obtained as 


K P 
iwin i (k) 2 
,,(e/ = xp I (n,)| (15.47) 
= p= 


where q is a normalization constant related to the bandwidth of prototype filter in the 
filter bank. 
Similarly, we obtain 


K P 
c = 5 D Pa (15.48) 
k=1 p=1 
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and 


®,,(e/%) = LEM apap (15.49) 


Pe gal 


Finally, Eqs. (15.47), (15.48), and (15.49) are substituted in Eq. (15.46 ) to calcu- 
late cy, (e721). We also note that after substituting Eqs. (15.47), (15.48), and (15.49) in 
Eq. (15.46), the normalizing factor g will cancel and hence its value become irrelevant. 


15.3.4 Power-Based Double-Talk Detectors 


Here, we derive an alternative decision number that proves instructive in relating the 
coherence function, defined in this section, with the variable step-size algorithms that were 
introduced in the various subsections of Section 15.2. Substituting u(n) by x(n), v(m) by 
d(n) = y(n) +e,(n), and H(e/”) by W,(e/”) in Eqs. (15.40), (15.41), and (15.42), and 
rearranging the result, we obtain 


Pa (e/”)|W(e/) |? 
Dig (ej®) 
Pyy (ce!) 


= —— 15.50 
Dalel) i l 


Ca (e1) 


In the absence of double-talk, Pyy (e°) = ®,, (e/”) and thus cy, (e/°) = 1. On the other 
hand, in the presence of double-talk, ®,,(e/°) = Dyll) E Da, (e/®) > P y (e7®) and 
thus c4, (e7®) < 1. Hence, as ®,(e/®) and ®,,(e/®) are real and positive functions of 
frequency w, we may define 


yy 


jejd 
aa ee: (15.51) 


D aq (e/”)daw 


D, = 


a ls 


as an alternative decision number, with similar properties as D,. Namely, D, ~ 1 in 
the absence of double-talk, and D, is significantly smaller than 1 in the presence of 
double-talk. Moreover, using the Parseval’s relation (2.60), Eq. (15.51) may be written as 


Ely?(n)] 


Oo = Ede] 


(15.52) 
Since y(n) is not available, it may appear that D,, as given by Eq. (15.53), is a useless 
decision number. However, noting that upon the convergence of the echo canceller }(n) ~ 
y(n), one may adopt an estimate of D, given by 


« EPM] 


>= Fw (15.53) 


Moreover, E[d?(n)] and E [v 2(n)] can be replaced by their estimates o? (n) and oj 2(n) 
that may be obtained recursively using Eqs. (15.20) and (15.21), respectively. 
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The decision number (15.53) may be used as follows. If Dy is greater than a threshold 
y, set the step-size parameter u (or jz) to a large value, for example, set ñ = 1. If D, <y, 
set the step-size parameter u (or (4) to a small value (possibly, 0). 

On the other hand, referring to Eq. (15.19), we may find that the variable step-size 
LMS algorithms that were introduced in the various subsections of Section 15.2 also use 
D», however, in an opposite way. When E [37 (n)] /E [d?(n)] is small, A(n) is set to a 
value close to 1 (a large step-size). Moreover, when E[7(n)]/E[d7(n)] is close to 1, 
A(n) converges to 1 — n (a smaller step-size). This clearly is not what one wishes to do 
after the convergence of the echo canceller and in the presence of the double-talk. 

Considering the above discussion, one may propose the following procedure for dealing 
with double-talks. The step-size parameter A(n) is set according to Eq. (15.19) for the 
first few seconds, after the activation of the echo canceller, and switches to the following 
setting afterwards: 
ji, if, >y 


ün) = (15.54) 


ñ fD <y 


where jz, is for the case when there is no double talk, and jz, is a small (possibly 0) 
step-size. 

We note that the computation of the decision number D is much simpler than QD). 
However, the use of Dy can be problematic, if the echo canceller, because of any reason, 
diverges away from its optimum setting. A divergence of the echo canceller from its 
optimum setting can result in a value of Dy smaller than y, which, if Eq. (15.54) is 
used, results in a selection (n) = ñ. This, in turn, can lead to a stalling or a very slow 
convergence of the algorithm. The decision number D,, on the other hand, is independent 
of the state of the echo canceller and, thus, never faces the latter problem. 


15.3.5 Numerical Results 


To develop a more in-depth understanding of the double-talk detector mechanisms that 
were discussed above, we present the results of an experiment. In this experiment, we 
start with a speech signal with a duration of 80s. This is treated as the signal x(n) is 
the AE cancellation setup of Figure 15.2. This is passed through a room acoustic with 
the same response as the one in Figure 15.4. A double-talk speech signal is then added 
in the intervals of (15 to 30), (45 to 50), and (60 to 70) s to form d(n). It is assumed that 
there is no noise in the room. 

To evaluate the decision number D,, we assume that the echo canceller is perfectly 
matched to the room acoustic, and hence (n) = y(n). The mean-squared values of 
d(n) and (n) are then calculated based on Eqs. (15.20) and (15.21), respectively, with 
a = 0.999. 

The decision number D; is evaluated by implementing multiple copies of the block dia- 
gram shown in Figure 15.13, using K multitaper prototype filters. The quantities |d;(n)|*, 
d;(n)x;*(n), and |x; (n) |? are averaged across different prototype filters and across the time. 
To implement these efficiently, polyphase filter banks are adopted (see Appendix 15A, 
for the details). The number of subbands in the filter banks is set equal to L = 256. The 
length of each prototype filter is KL. With K = 8, this results in a length KL = 2048. The 
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Figure 15.10 Variation of the decision number D, in a double-talk detector. L = 256. 


MATLAB program used to generate the results that are presented in Figures 15.10 and 
15.11 is available on the accompanying website. It is called “DTDexperiment .m.” 

Both results in Figures 15.10 and 15.11 supply some information that may be considered 
as sufficient in conveying the presence of the double-talk intervals — the intervals between 
each pair of the vertical dashed lines. The decision number D, has more distinct edges 
showing the transition into and out of each double-talk periods. The decision number D}, 
on the other hand, more clearly switched to a lower level during each double-talk period. 

From the discussion above (and also by going through the MATLAB program “DTDex- 
periment .m’), it is obvious that the generation of D, requires a lot more computational 
complexity than the generation of D,. However, to compute D,, the assumption is made 
that the echo canceller has already converged. Unfortunately, this may not be always 
true. Hence, in practice, the decision number D, is less reliable than D,. In other words, 
we pay a very high price (in terms of computational complexity) to implement a more 
reliable double-talk detector. 

A reader may recall from Eq. (15.46) that in the absence of double-talk, the coherence 
function c,,(e/”) should be equal to 1, for any œw; thus D, should be also equal to 1. This 
is not the case in Figure 15.10. This difference from the theory is related to the fact that 
the width of the narrow-band filters used to generate d;(n) and x;(n) is not sufficiently 
narrow. An interested reader may run the MATLAB program “DTDexperiment .m,” 
available on the accompanying website, with a larger value of the parameter L, to see 
how D, improves. However, we should also note that increasing L results in further 
delay in the detection of a double-talk. This is because as L increases, the length of the 
filters in the filter banks increases and this in turn increases the filter bank (group) delay. 
Therefore, the choice of L is a compromise that should be made by the system designer. 
Figure 15.12 presents the result of repeating the experiment that lead to Figure 15.10, with 
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Figure 15.11 Variation of the decision number D, is a double-talk detector. 
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Figure 15.12 Variation of the decision number D}, is a double-talk detector. L = 1024. 
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L increased from 256 to 1024. The improvement in the result is clear. At the same time, 
a delay in the detection of transitions into a double-talk period and out of a double-talk 
period is more clearly seen. 


15.4 Howling Suppression 


As discussed earlier, howling can occur as a result of a possible positive feedback between 
the AE and the HE (Figure 15.1). Two methods of howling suppression have been devel- 
oped and discussed in the literature. These methods are reviewed in the following sections. 


15.4.1 Howling Suppression Through Notch Filtering 


A howling phenomena is characterized by a sine-wave (line spectra) in the signal at any 
point of the AE—HE loop. Hence, a method of suppressing howling would be to deploy a 
signal analysis technique that evaluates the spectral content of the signal at an appropriate 
point in the AE—HE loop and adds a notch filter at the point in frequency that a howling 
is perceived. In a system that is equipped with an echo canceller, the most appropriate 
point in the loop is at the AE canceller output, e(n). 

Variety of schemes can be suggested to establish this task. A trivial scheme that we 
have examined and found effective is to perform signal analysis and notch filtering jointly 
by taking the following steps. 


1. Take a block of length L of the signal samples and apply an FFT to them to evaluate 
the spectral content of the signal. 

2. Over successive blocks evaluate the measured spectra and search for any spectral line 
that may be growing in amplitude. 

3. Introduce and apply a real gain 0 < a < | at the frequency bin(s) where howling is 
perceived. 

4. Reduce the gain a if the howling persists. 

5. Convert back the frequency domain samples to the time domain. 


One may find this method particularly appealing in cases where the AE canceller is 
implemented using the FBLMS algorithm; as in that case, the tasks of converting the 
samples to the frequency domain and back to the time domain are already part of the 
system. 


15.4.2 Howling Suppression by Spectral Shift 


The ITU-T Recommendation G.167, the standard for AE controllers, has recommended 
the following method to avoid howling. Slightly shift the spectrum of the transmit signal 
e(n). The maximum frequency shift allowed is 5 Hz. 

To understand how this mechanism works, consider the feedback system presented in 
Figure 15.13. It consists of a microphone, a loudspeaker, the room acoustics, and any 
electronic circuits in the AE—HE loop. We model the loudspeaker and the microphone 
as blocks with ideal transfer functions of unity. The room acoustics plus any electronic 
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Figure 15.13 An acoustic-electronic feedback system that may result in howling. 


circuits in the loop are collectively modeled by a transfer function H(f) = H,(f)M (f). 
Then, howling will occur at a frequency f = fo, if |H(fp)| > 1 and Z(H (fo)) = 2kz, 
for some integer k. 

If the ITU-T spectral shift recommendation is included in Figure 15.13, the resulting 
system may be modeled as in Figure 15.14. Now, assuming that the spectrum shifter 
introduces a frequency shift of Af, the loop gain of the system in Figure 15.14 will be 
H(f + Af). The following plausible statement have been made in the literature to explain 
why spectral shift suppresses howling. Any frequency component that propagates through 
the loop, starting from a point, returns back to the same point with a different frequency. 
Hence, no single frequency component will get a chance of growing in amplitude as it 
circles in the loop and, therefore, howling is avoided. 

The frequency shifter in Figure 15.14 is implemented as follows. We first note that 
a direct modulation of u(t) according to the equation u,,(t) = 2u(t) cos(2x Aft) does 
not work, because U,,(f) = U(f + Af) + U(f — Af); that is, u,,(t) will contain two 
replicas of U(f). In other words, a positive and a negative shift of the spectrum is 
produced. The ITU-T recommendation is to convert U(f) to U(f + Af) or U(f — Af), 
only. This can be established by following the structure presented in Figure 15.15. The 
block labeled as Hysp(f) is a vestigial side-band (VSB) filter that removes the part of the 
spectrum of u(t) at the negative side of the frequency axis. The resulting signal, uysp(t), 
is mixed with the complex sine-wave e/**4/; hence, its spectrum is shifted to the right 
by Af. Taking the real part of the result and applying a gain of 2 results in a signal v(t), 


spectrum 
shifter 


Figure 15.14 Block diagram of Figure 15.13 with an added spectrum shifter to avoid howling. 
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Figure 15.15 Block diagram of spectrum shifter. 


whose Fourier transform, V(f), is related to U(f) according to the following equation 


USHA), f>0 
V = 15.55 
ae aes <0 aia 


When u(t) is replaced by its sampled version, u(n), Hysp(f) should be also imple- 
mented in discrete time. We use hysp, and Hysp(e/”) to denote the impulse response 
and the frequency response of this discrete time filter. Figure 15.16 presents a possible 
magnitude response of Aysg,„ and the procedure that we suggest for its design. We start 
with the design of a half-band filter Hyp (e/”). This is called half-band filter because its 
passband occupies half of the full-band 0 < œw < x (equivalently, —7/2 < wœ < 7/2). A 


(b) 


Figure 15.16 Construction of Hysp(e/”) from a half-band filter Hyp (e/”). 
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nice and relevant to implementation property that half-band filters have is that almost half 
of their coefficients are 0. More specifically, hyp, = 0, for n = +2, +4, ... This reduces 
the complexity of their implement by one-half. Once Hyp(e/”) is designed, it can be 
converted to the desired filter Hysp(e/”) by modulating its coefficients with the sequence 
{7"}={...,7,-l,-J, 1, 7, -1, -—j, ...}. Noting these points, given a half-band design 
hyp.n> the coefficients of the desired filter hysg,„ are complex-valued, with the real and 
imaginary parts 


h n= 0 
hyse = Yq 15.56 
VSB,n i ET ( ) 
and 
hl 0, n=0,+2,+4,... (15.57) 
h = = : 
Oe tee n = 153) aes 
respectively. 


To help the reader to develop a better understanding of the impact of a spectral shift on 
the quality of speech signals, the MATLAB code “specshift.m,” on the accompanying 
website, takes a speech signal, play it, then, introduce a shift in its spectrum and replay. 
The amount of spectral shift is a parameter. This will allow a curious reader to better 
understand what is the range of an acceptable spectral shift and why a maximum frequency 
shift 5 Hz has been imposed in the ITU-T Recommendation G.167. 


15.5 Stereophonic Acoustic Echo Cancellation 


The AE canceller setup that was presented in Figure 15.2 and was discussed in detail in 
the previous section is a monophonic system. Stereophonic AE cancellers have also been 
proposed and developed. Figure 15.17 presents the details of a stereophonic AE canceller 
setup. Compared to its monophonic counterpart, stereophonic teleconferencing provides 
special information that help a listener to distinguish the relative position of the speaker 
in the far-end teleconferencing room. 

There are two microphones and two loudspeakers in each of the near-end and far- 
end teleconferencing rooms. Hence, considering the echo cancellation by the near-end 
room, there should be a pair of echo cancellers for removing echo signals from each of 
the microphones. A similar pair of echo cancellers should also be deployed by the far- 
end room. In Figure 15.17, for brevity and clarity of presentation, only one of the echo 
cancellers by the near-end room is shown. The relevant signals and signal paths/echoes 
are also shown. The speech signal, s(n), from a person in the far-end room passes through 
the acoustic impulse responses g; „ and g,,, before reaching the pair of microphones in 
the far-end room. Assuming ideal transmission lines, one will find that 


x(n) = gin * s(n) (15.58) 


and 
X(n) = Bon * s(n) (15.59) 


The signals xı (n) and x(n), after being broadcast in the near-end room, will be picked 
up by the pair of microphones in the room. Here, the discussion is limited to the signal 
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far-end conference room near-end conference room 


Figure 15.17 A stereophonic acoustic echo cancellation setup. 


picked up by the microphone on the left-hand side. The room echo response between the 
loudspeakers in the near-end room and this microphone are represented by the impulse 


responses Woi , and Woz, n- The AEC inputs are x, (n) and x,(n), and its output is given by 
$n) = wy (2), (1) + w3 (2)x2(n) (15.60) 
where 
win) = [w, (2) wan) wy ya @)I" (15.61) 
wn) = [w on) wy (2) -w y- 0) (15.62) 
xmn) = kam aan an- N+ D (15.63) 
x(n) = b(n) x(n — 1)-- -x(n — N + 1)]7 (15.64) 
Defining 
wn) = [wi on) won) w0) w0): wi y0) wyl" (15.65) 
and 


x(n) = [x,(n) nn) zmas 1) mnel) e x2 -N+4+1) H(n-N+ Dy 
(15.66) 
Equation (15.60) may be rearranged as 


$n) = w" (n)x(n) (15.67) 


This, clearly, has the same form as Eq. (15.3), with apparently a minor difference that the 
length of the vectors w(7) and x(n) has been doubled. Noting this, one may argue that all 
the algorithms that were discussed in Section 15.2 for the monophonic AE cancellers may 
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be readily extended to their stereophonic counterparts. However, as discussed below, this 
is not really the case. There are certain peculiar behaviors in stereophonic AE cancellers 
that need to be taken care of through special methods that will be introduced in the 
subsequent parts of this section. 


15.5.1 The Fundamental Problem 


There is a fundamental problem with the stereophonic AE cancellers that distinguishes it 
from its monophonic counterpart. In the latter, under a relatively weak constraints on the 
spectrum of the input signal to the echo canceller, the tap-weight vector w(7), on mean, 
converges toward the room echo response w,, if the length of w(n) is greater than or equal 
to the length of w,. Surprisingly, this is not the case in a stereophonic echo canceller. In 
the sequel, we explain this peculiar behavior of the stereophonic echo cancellers from the 
frequency domain point of view and from the time domain point of view. 

To present the frequency domain point of view of the problem, we recall that the defi- 
nition (15.40) and note that the coherence function between x(n) and x(n) is defined as 


I, .,(e/)/? 


(15.68) 
Deke) Py (e7) 


Cyr x (e/®) = 


Moreover, recalling Eqs. (15.58) and (15.59) and using the relevant identities from 
Chapter 2 (also, see Problem P2.14), one may find that 


®,, x, (E) = Pp (e/)|G, (e/”) (15.69) 
De") = P y (e9) Galet?) (15.70) 
Du (2?) = Pp (eG (G3) (15.71) 
Substituting these results in Eq. (15.68), one will find that 
Can (1 (15.72) 


which may be interpreted as x,(m) and x,(n) are coherently fully correlated. This, as 
demonstrated below, has a consequence that the tap-weight vector w(n) of an AE canceller 
may not converge to the desired unique solution w = w,. 

Clearly, a trivial as well as desirable choice for the pair of tap-weight vectors of the 
echo canceller, after convergence, are W,; = W,; and Wo2 = Wo. In the frequency domain, 
these are written as 


Wilet?) = Wey (e!”) (15.73) 
and : 
Walei?) = Walei?) (15.74) 


respectively. This choice, obviously, leads to the estimate (n) = y(n). Hence, it also 
leads to the desired output error 
e(n) = e (n) (15.75) 


A fundamental question to ask here is “are Eqs. (15.73) and (15.74) the only choices 
of W,ı(e7®) and W,2(e1®), respectively, that result in the desirable result (15.75)?”. 
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We answer this question by providing a few choices that show that the above solution is 
not unique. We first note that the choice (15.73) and (15.74) implies that 


¥ (e!”) = S(e!”)(G (e!”) Woy (e/”) + Gy (e!?) Won (e/”)) 
= S(e!”)(G, (e/°) W, (e/) + Gy (e/) W alet) = Y (e) (15.76) 


On the other hand, one can easily show that the alternative choice 


Wilet?) = W, let?) + aG (lel?) (15.77) 
and ` , l 
Wo2 (e?) = Woa (e/®) — aG (e®) (15.78) 


for any value of œ, also lead to the identity F (ei 2) = Y(e/®), hence the desirable result 
(15.75). Surprisingly, there are more choices that also lead to Eq. (15.69). Couple of such 
choices are 


Way (el) =a (Ware! + ae Waste”) (15.79) 
and is 
Woo(e/”) = (1 — a) (Werte + ae < Wael) (15.80) 
or g 
Woi (e!) = a Wo (et?) + (1 = a) ae Wo2(e!”) (15.81) 
G,(e/”) 
and : 
Wole?) = a Wole?) + (1 — a) aa Woi (ef?) (15.82) 
G,(e/®) 


for any value of a. 

This observation shows that there are an infinite set of choices for W,1(e/”) and 
Woo (e/”) that lead to the desirable echo canceller output error (15.75). Moreover, exam- 
ining of any of the pair of choices ((15.77), (15.78)), ((15.79), (15.80)) and ((15.81), 
(15.82)), one may note that the presence of these extended solutions is a consequence of 
the fact that x; (n) and x,(n) both originate from the same source, s(n). This, in turn, 
relates to the full coherence relation between x,(n) and x,(n), that is, the identity (15.72). 

The nonunique solutions presented by the pairs ((15.77), (15.78)), ((15.79), (15.80)) 
and ((15.81), (15.82)) suffer from the following problematic property. They depend on 
the acoustic signal propagations, G,(e/°) and G,(e/”), in the far-end room. This is 
problematic, particularly here, because in a stereophonic teleconferencing setup G , (e/”) 
and G,(e/”) vary frequently as different people (at different locations in the far-end room) 
talk during different time periods. Hence, special measures have to be taken to avoid 
this potential problem of stereophonic AE cancellers. Such measures are introduced in 
Section 15.2.2. 

The pairs ((15.77), (15.78)), (15.79), (15.80)), and (15.81), (15.82)) may be viewed as 
unconstrained solutions to the problem of stereophonic AE cancellation. In particular, the 
last two pairs may not be feasible in practice because divisions by G,(e/@) and G,(e/”) 
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can lead to noncausal and/or unstable systems. Nevertheless, when g; „ and g2,„ have 
finite durations that are shorter than or equal to the lengths of the echo canceller tapped- 
delay lines, namely the length of w; „ and w, „, Eqs. (15.77) and (15.78) are realizable 
and they have the time-domain equivalent of 


Wor = Wo + @8> (15.83) 


and 
wo = Wor — Q81 (15.84) 


respectively. Obviously, g; and g), the vectors with the elements of g,,, and gpn, 
respectively. 

Finally, we note that, in practice, the acoustic impulse responses g; „ and gs „, as well 
as Wol,n and Woy, are usually longer than the echo canceller length, albeit, with very 
low energy in their tails. Noting this, strictly speaking, one may find that the alternative 
solutions such as those in Eqs. (15.83) and (15.84) do not really minimize the output error. 
They may be viewed only as approximation to the underlying Wiener—Hopf equation. 
From what we learnt in the previous chapters, the discussion in this section leads to a 
conclusion that may be cast into the following form. 

In a stereophonic AE cancellation setup, the correlation matrix R = E[x(n)x'(n)], 
where x(n) is defined as in Eq. (15.66), is usually is a highly ill-conditioned. That is, it 
has a set of eigenvalues that may be very close to 0. As a result, the echo canceller suffers 
from some very slow modes of convergence. This, in turn, results in convergence of the 
echo canceller to a set of coefficients that may not match the samples of the conference 
room acoustic response. This leads to a misalignment of the echo canceller coefficients, 
where the misalignment, as presented before, is defined as Eq. (15.6). Moreover, the 
misalignment depends on the acoustic responses, gı „ and g,,,, in the far-end room. 
As noted earlier, this is problematic because a change of speaker in the far-end room 
results in a dramatic change in the responses g; „ and g3 „ and hence, the echoes may be 
heard momentarily in the far-end room. This of course is unacceptable and solutions that 
minimize misalignment are necessary for satisfactory operation of any stereophonic AE 
canceller. Such solutions are discussed next. 

To remind the reader on how a close to zero eigenvalue, arising from a highly ill- 
conditioned correlation matrix R in an adaptive filter, leads to a significant misalignment 
is the converged coefficients, Figure 15.18 presents an example of the performance surface 
of a 2-tap adaptive filter when one of the eigenvalues of R is 0. As seen, starting from 
any point on the performance surface, a gradient algorithm converges to the nearest point 
on the line marked with E = &,,,, the point (Wg, Wo), and remains there because there 
is no gradient to push the tap weights w,. and w,, toward the desired optimum point 
(Woo, Wo1)- 


15.5.2 Reducing Coherence Between x(n) and x(n) 


As discussed earlier, the coherence between x,(m) and x(n) results in a correlation 
matrix R = E[x(n)x!(n)] with a number of close to zero eigenvalues. As a result, the 
echo canceller may converge to a point with a possibly significant misalignment between 
the converged tap weights and the true tap weights of the room responses. To solve this 
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Figure 15.18 Demonstration of a performance surface when the underlying correlation matrix has 
a zero eigenvalue. 


problem, one should introduce some changes in x,(n) and x,(n) so that the coherence 
between them reduces. Here, we introduce three methods for achieving this reduction. 


Coherence reduction through additive noise 


A trivial method of reducing coherence between x, (n) and x,(n) that was proposed in the 
early days of development of AE cancellers was to add a pair of independent white noise 
processes to x(n) and x,(n). This addition is done at a point before x(n) and x(n) are 
being passed to the echo canceller and the loudspeakers in the near-end room. This method 
replaces the correlation matrix R by R + o7I, where oc? is the variance of the added noise 
samples. According to the results presented in Chapter 4, this boosts up all the eigenvalues 
of the underlying correlation matrix by oĉ, and thus resolves the problem of the close to 
zero eigenvalues of R — the main source of misalignment. However, unfortunately, this 
method was found to be unsuccessful because successful operation of it requires addition 
of an unacceptably high noise power that would be audible and thus annoying (Sondhi, 
Morgan, and Hall, 1995). 


Coherence Reduction Through Leaky LMS Algorithm 


The leaky LMS algorithm is an intelligent technique that, without adding any noise to 
the input to an LMS-based adaptive filter, increases the eigenvalues of the underlying 
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correlation matrix by controllable constant. The leaky LMS algorithm was presented in 
Chapter 6 under an exercise problem at the end of the chapter. Here, for the completeness 
of our discussion, we present the leaky LMS algorithm in the context of an AE canceller. 
The use of the leaky LMS algorithm to reduce the misalignment in AE cancellers was 
first suggested in Hoya et al. (1998). 

The leaky LMS algorithm works based on the recursion 


w(n + 1) = Bwi(n) + 2pe(n)x(n) (15.85) 


where £ is a constant slightly smaller than 1. The properties of this algorithm can be best 
understood by looking at the convergence of the mean of w(n) as Eq. (15.85) iterates. 
In the case of the stereophonic AE canceller of Figure 15.17, 


e(n) = d(n) — y(n) 
= e,(n) + y(n) — y(n) 
= e,(n) + (W, — w(n))'x(n) (15.86) 


Substituting Eq. (15.86) in Eq. (15.85), taking expectations on both sides of the result, 
and recalling the independence assumption of Section 6.3, and defining W(n) = E[w(n)], 
we obtain 


w(n + 1) = BW(n) — 2uRW(n) + 2up 
= (BI — 2uR)w(n) + 2up 


= (I — 2uR’)wW(n) + 2up (15.87) 
where 1-8 
R = R + ——I (15.88) 
2u 
and 
p = Rw, (15.89) 


Recalling the derivations in Chapters 5 and 6, one may deduce from Eq. (15.87) that 


e The modes of convergence of the leaky LMS algorithm are determined by the set of 
eigenvalues of R’, viz., 


P= pp E (15.90) 


where A,’s are the eigenvalues of R. Hence, assuming that I£ is sufficiently large, 
this avoids the presence of any very slow mode of convergence in the algorithm. 
e Asn > œ, W(n) converges to 


Wo =R''p 
=R 'Rw, (15.91) 


This shows that W, #w,; hence, the leaky LMS bounds to suffer from some 
misalignment. 
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Using Eq. (15.91), the misalignment of the leaky LMS algorithm when w(n) has 
approached Ww, is obtained as 


t = (w, —R’ 'Rw,)'(w, — R 'Rw,) 
= wl (A — RT'R)?w, (15.92) 


Recalling that according to the unitary singularity transformation, R and R’ may be 
expanded as 
R = QAQ” (15.93) 


and 
R’ = QA’'QT (15.94) 


where A and A’ are the diagonal matrices of the eigenvalues of R and R’, respectively, 
and Q is the associated eigenvectors matrix. Also, note that R and R’ share the same set 
of eigenvectors. Substituting Eqs. (15.93) and (15.94) in Eq. (15.92), recalling the identity 
QTQ = I and the definition 

Q = [qo qi: d2y-1] (15.95) 


and rearranging the result, we obtain 


=! (qiw,)? 


2 2N 
c= (=£) xy 4s, (15.96) 


l 


This result shows that, on one hand, we should choose 6 sufficiently smaller than 1 
(1 — B large) to avoid slow modes of convergence. On the other hand, we should choose 
B sufficiently close to 1 (1 — 6 small) to reduce the misalignment. 


Coherence Reduction Through Exclusive Maximum Tap-Selection 


The exclusive maximum (XM) tap-selection method is an adaptation strategy that is 
applicable to the various adaptive algorithms, including NLMS, affine projection LMS, 
and RLS algorithms. The basic idea is to apply each adaptation step only to those taps 
with the larger amplitude tap inputs. Khong and Naylor (2005, 2007) have studied the 
application of the XM tap-selection method to stereophonic AE cancellers and noted that 
this method has some success in reducing the misalignment. The rationale here is that 
at each iteration of the adaptive algorithm, the set of tap inputs selected from the two 
channels are randomly unaligned, and this results in a lower coherency among the signals 
from the two channels that are used for adaptation. 


Coherence Reduction Through Nonlinearity 


It has been noted that introduction of some nonlinearity into a speech signal to a great 
degree is perceptually undetectable. On the other hand, addition of such nonlinearity 
reduces the coherence between a pair of signals in a stereophonic AE canceller, and 
thus resolves the problem of a large misalignment. Morgan, Hall, and Benesty (2001) 
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investigated several types of nonlinearities for stereophonic AE cancellers and noted that 
they all have similar effect in reducing the coherence between the processed signals. 
Among the types of nonlinearities evaluated, half-wave rectifier, defined as 

x(n) + |x(n)| 


x(n) = x(n) + ear ae (15.97) 


where œ is a constant that determines the amount of the added nonlinearity, was rec- 
ognized as the simplest to implement and the one with minimal effects on the speech 
quality. Moreover, various authors have studied the combinations of nonlinearities and 
various adaptive filtering algorithms, for example, the leaky LMS and XM tap-section 
methods, and have demonstrated that such combinations can lead to systems with very 
good/acceptable misalignment performance. 


15.5.3 The LMS-Newton Algorithm for Stereophonic Systems 


In Section 15.2, a number of adaptive filter structures and algorithms were introduced 
for monophonic echo cancellers. Namely, the normalized LMS, affine projection LMS, 
frequency domain block LMS, subband LMS, and efficient implementation of the LMS- 
Newton algorithm. Among these, we found that affine projection LMS algorithm have 
the fastest rate of convergence; however, it settles a some lower ERLE compared to 
the rest of the algorithms (except the subband LMS algorithm, whose lower ERLE was 
attributed to the imperfect responses of the analysis and synthesis filters). The algorithms 
presented in Tables 15.1 and 15.2 are directly applicable to the stereophonic case, simply 
by redefining the tap-weight vector w(n) and the tap-input vector x(n) according to 
Eqs. (15.66) and (15.65), respectively. Development of the frequency domain block LMS 
algorithm for stereophonic AE cancellers also is not difficult. However, the extension of 
the LMS-Newton algorithm is not that straightforward. 

Noting that the LMS-Newton algorithm has superior performance over the rest of the 
algorithms, in the rest of this chapter, we present a full development of the LMS-Newton 
algorithm for stereophonic AE cancellation. This requires a thorough understanding of the 
two-channel lattice predictor and how this can be used to obtain an estimate of the inverse 
of the correlation matrix R based on autoregressive modeling. This development is parallel 
to the single-channel lattice structures and algorithms that were presented in Chapter 11. 
Also, we will find that the application of the half-wave rectifier, Eq. (15.97), to improve on 
the misalignment of the LMS-Newton stereophonic echo canceller requires some special 
treatment. The details of this treatment will also be presented. The results presented in 
this section have been developed in Rao and Farhang-Boroujeny (2009) and (2010). 


Two-Channel Lattice Predictor 


Recall the one-channel lattice predictor that was developed in Chapter 11. In particular, 
recall the order-update equations (11.60) and (11.61) of the mth stage of the predictor. In 
the case of the two-channel predictor, the duals of Eqs. (11.60) and (11.61) are 


f,,(n) = fp) — €,,D,_1( 1) (15.98) 


qT 


and 
b,, (2) = bp- = 1) = Raigi) (15.99) 
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fo(n) 


Figure 15.19 Details of Stage m of a two-channel lattice predictor. All the signals are 2-by-1 
vectors and the partial correlation (PARCOR) coefficients, «,,,, are 2-by-2 matrices. 


m? 


where fp (2) = [fim n) fom (n)]" and bp (n) = [bi m(n) ba m(n)]" are the 2 x 1 forward 
and backward prediction-error vectors, respectively, and the 2 x 2 reflection coefficient 
matrix K, is defined as 


K K 
Km = | 11,m a (15.100) 
K21,m K22,m 


Figure 15.19 depicts a two-channel lattice predictor structure, and the details of its mth 
stage. The lattice predictor is initialized with fọ(n) = bọ (n) = X(n), where 


x(n) = b (15.101) 


x(n) 


As in the case of single-channel, here also a simple gradient adaptive algorithm can 
be used to compute K„(n) in a recursive manner. The reflection coefficients of the mth 
cell can be chosen so as to minimize the sum of the instantaneous forward and back- 
ward prediction-errors of the corresponding cell, that is, f2(n) + b2, (n). This leads to the 
following LMS recursions 


2,0 


PA l=k: a a 
Kij,m a T ) Kij,m o Pj m10) e 
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XLF; m bim- — 1) + fim- 2); m(n)]_—— (15.102) 
for (i, j) = {(1, 1), Q, 2), (2, 1), 2, 2)} 


where Mpo is the adaptation step-size parameter, and £ is a constant added to prevent 
gradient noise amplification, when any of the power terms P; m-ı(n) are small. The 
power terms are estimated using the following recursive equations 


Pij mn) = BP 3 m(n — 1) +0.5(1 Db m1 — 1) + fPm—1@)] (15.103) 
for (i, j) = {(, 1), (1, 2), (2, 1), (2, D} 


where £ is a constant close to but smaller than 1. 

Upon the convergence of the PARCOR coefficient matrices K's, the backward predic- 
tion errors b,, (7) will constitute a set of 2-by-1 uncorrelated random vectors. Following 
a similar line of thought that led to the lattice joint process estimator of Figure 11.7, an 
AE canceller may be implemented by passing the backward prediction errors b,, (7) to a 
linear combiner. Hence, the input to the linear combiner will be the vector 


b(n) = [b] n) bin) -bg ao] (15.104) 
= [bi o0) bzon) biim) baan) +++ binam) by y_1(n)]" 


This is a vector of length 2N. 

If we use e(n) to denote the tap-weight vector of the 2N-tap linear combiner at the 
time instant n, the output of the lattice joint process estimator (here, the estimated echo 
signal) is given by 

$(n) =e! (n)x(n) (15.105) 


The output error is then obtained as 
e(n) = d(n) — (n) (15.106) 
and c(n) may be updated using the recursive equation 
e(n + 1) = e(n) + 2uR;b(n)e(n) (15.107) 


where R,, = E [b(n)bT(n)]. In the case of single-channel lattice joint process estima- 
tor, R,, is a diagonal matrix; hence, R; is trivially obtained by inverting the diago- 
nal elements of R,,. This is not the case in a two-channel joint process estimator as 
E[b„(n)b?(n)] = 0, for m Æ j, but R}, = E[b,,(1)b,,(2)] is not a diagonal matrix. 
In other words, although b,, (n) and b; (n), m # j, are uncorrelated, thus R, ,, = 0, the 
elements of b,,(”), for any m, are not necessarily uncorrelated. Accordingly, one should 
note that 


m 


R» = l . . f (15.108) 
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and, hence, f 
Rib y 0 
0 RL... 0 
Roa ee (15.109) 
: : : Re 
0 0 by-1bn-1 


Assuming that the PARCOR coefficients have converged to their optimum values and 
have been kept fixed, the update Eq. (15.107) is effectively an LMS-Newton algorithm 
and thus its behavior is governed by a single mode of convergence. However, in the 
particular case of echo cancellers, where the PARCORs should be continuously adopted 
to follow variation of the statistics of the input process (a speech signal), the lattice joint 
process estimator suffers from a significant misadjustment, and thus renders its application 
less attractive. The reader is referred to Section 11.13.1 for a more detailed discussion 
on this topic. In the sequel, we follow similar derivations to those in Section 11.15 to 
develop the dual of the algorithms 1 and 2 for the case of stereophonic AE cancellers, 
that is, the two-channel case. 


Algorithms 


Two versions of the LMS-Newton algorithm based on autoregressive modeling were 
proposed for the single-channel case in Section 11.15. These were based on the fact 
that the input sequence (a speech signal) to the adaptive filter can be modeled as an 
autoregressive process of order M, where M can be much smaller than the filter length 
N. This results in an efficient way of updating the term R7!x(n) of the LMS-Newton 
update Eq. (7.32) without having to estimate R~', where R = E [x(n)x!(n)]. Here, we 
derive the LMS-Newton algorithms for the stereophonic setup/the two-channel case. The 
reader may note that some of the derivations here are very similar to those presented in 
Section 11.15. However, differences also exist. 

As in Section 11.15, in the rest of this chapter, we also add the subscript xx to R when 
a reference is made to the correlation matrix of x(n). This is to differentiate R,. from the 
other correlation matrices that we also encounter in the sequel. 

For a stereophonic system, we note that b(n) can be expressed as 


b(n) = Lx(n) (15.110) 


where x(n) is the 2N x 1 filter input vector, defined by Eq. (15.66), b(n) is the 2N x 
1 vector of backward prediction errors, defined by Eq. (15.98), and L is the 2N x 2N 
transformation matrix and has the form 


I 0 os 0 0 
“Gi, I ar 0 0 

L=| —Goi -G22 I ze 0 0 (15.111) 
=Gy-11 —Gy_i12 —Gy_13 +++ —Gyiy-1 I 


where I is a 2 x 2 identity matrix, 0 is a 2 x 2 zero matrix, and each G,, ; is a2 x 2 


backward predictor coefficient matrix. The transformation (15.110) provides a one-to-one 
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correspondence between the input vector x(n) and the backward prediction-error vector 
b(n). From Eq. (15.110), it follows that 


R3 = L'R; L (15.112) 
Next, we recall the ideal LMS-Newton update equation 
w(n + 1) = w(n) + 2uRz!x(n)e(n) (15.113) 
and note that using Eqs. (15.110) and (15.112) in Eq. (15.113), one will obtain 
w(n + 1) = w(n) + 2uu(n)e(n) (15.114) 


where 
u(n) = LTR; b(n) (15.115) 


The significance of Eq. (15.114) is that the computation of u(n) according to 
Eq. (15.115) can be performed at a low complexity. In the sequel, following a similar 
line of derivations to those in Chapter 11, we present two implementations of the 
LMS-Newton algorithms based on Eqs. (15.114) and (15.115). These are referred to as 
Algorithm 1 and Algorithm 2. 


Algorithm 1: This algorithm will involve the direct implementation of Eq. (15.115) 
through the use of a lattice predictor. If we assume that the input sequence x(n) is an 
autoregressive process of order M < N, a lattice predictor of order M is sufficient. The 
matrix L then takes the form of 


I 0 .-- 0 0- 0 0 --- 0 
-Gii | E 0 0- 0 0 -0 

L= | -Gyi -Gyz I 0- 0 0 0 (15.116) 
0 -Gui -Gym I-e 0 0 --- 0 
0 0 ..-- 0 0- -Gui -Guz I 


and the vector b(n) takes the form of 


b(n) = [bi oín), bz (n), bıı (n), b, (n), seag bi m(n), by y(n), bi y(n — 1), 
by y(n —1),..., bimn —-L+M+1), by y(n —-L+M+ pir (15.117) 


The special structure of b(n) in Eq. (15.117) requires us to update only the first 2(M + 1) 
elements of b(n). The remaining elements are just the delayed versions of by(n) = 
[bi m(n) by y". 

We first consider the multiplication of b(n) by Ry. It involves the estimation of the 
correlation matrices Ry b through R,,,,,,- These may be obtained using the recursive 
equations 

R,,,,5, 0 + 1) = BR,,,»,, (2) + (1 — BYb,, (2)b;, (7) (15.118) 
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These estimates are used to construct 


Ron 98 > 0 0 
0 Ro: 0 0 
RJ = (15.119) 
i o 0 Ryl, 0 
0 0 0 Riley 


We note that the M + 1th diagonal element of Ry repeats for its subsequent diagonal 
elements. We also note that in a typical AE canceller, M can be chosen to be as small as 
8 and inverting M + 1 matrices of size 2 x 2 constitutes only a small percentage of AE 
canceller complexity. 

To complete the computation of u(n), we now have to multiply LT by Rz; b(n). A 
close examination of the structures of the L matrix and b(n) vector described by Eqs. 
(15.116) and (15.117), respectively, will reveal that in order to compute u(n), only the 
first 2(M + 1) and the last 2M elements of u(n) need to be computed. The remaining 
elements of u(n) are the delayed versions of its (2M + 1)th and (2M + 2)nd elements. The 
elements of L can be estimated using the two-channel Levinson—Durbin algorithm — an 
extension of the single-channel Levinson—Durbin algorithm. This extension is presented 
in Appendix I of this chapter. Note that only the coefficients of prediction filters of order 
1 to M need to be computed. Accordingly, we formulate the Algorithm 1 as 


1. Run the lattice predictor of order M using the Eqs. (15.98), (15.99), (15.102), and 
(15.103) to obtain the reflection coefficients and the backward prediction errors. 

2. Run the two-channel Levinson—Durbin algorithm of Appendix 15B to convert the 
reflection coefficients to the backward predictor coefficients G,,,_;’s. The equations for 
this particular step are shown below for completeness. 

Aji) = KT (n) 
G, (n) = k(n) 


fr m=1:M-1 


Ampi O) 5 Am j) ea MGa n); j=1,2,...,m 

A mtimy1 0) = Ki, (n) 

Gm1, j1 0) = Gm, j) — Kmp An jn); j= 1,2,...,m 
Grit = Kmp) 


where each A,, ; is a 2 x 2 forward predictor coefficient matrix.! The above results 
may be substituted in Eq. (15.116) to form the transformation matrix L. 


l It is important to note that in the single-channel lattice, the forward and backward predictor coefficients are 


related according to the equation a, j = 8m, mp1- j? SEE Chapter 11. However, such a relationship does not hold 
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3. Compute the elements that are the delayed versions of the (2M + 1)th and (2M + 2)nd 
elements of u(n) from the previous iteration as 


uy jn) = uy jamn- 1); J=M4+1,M42,...,L-M-1 
Uy (n) = Uy jin- 1); J=M4+1,M42,...,L-—-M-1 


Note uj (1) = [u,;(”) Uy ;(n)]" is the jth 2 x 1 vector component of u(n). 
4. Compute the first 2(M + 1) elements of u(n) as 


[Uy o0), uz o(n), Uy) (2), u2 (0), «+ u1 y(n), u> wl" — Lib, (n) 


where i 
Rj, 5 (2) bo(n) 
Ro», (n)b, (n) 


bi) =| Rolo by (n) 
Re fb bum — 1) 


Rou Dy (a — M) 


and Ly denotes the top-left part of L having dimension 2(2M + 1) x 2(M + 1). 
5. Compute the last 2M elements of u(n) as 


lui a-m), Uy (L—-my(N), sees uy 7-1”), trim)!" = LT b(n) 


where i 
Rody — L+M + 1)by(n— L+2M) 
Rolon- L+ M+ b(n — L +2M — 1) 
b(n) = 5 


Rolou 0 -L+M+1)by(n-L+M+1) 
and L,, is the bottom-right part of L having dimension 2M x 2M, then the last 2M 
elements of u(n) are 

6. Finally, compute the adaptive filter output ĵ(n) = w!(n)x(n), the error signal e(n) = 
d(n) — (n) and update the filter taps using Eq. (15.114). 


To implement the lattice predictor using equations, we require 25M +5 multiplica- 
tions. The two-channel Levinson-Durbin algorithm requires 8M(M — 1) multiplications. 
We further need 6M* + 26M +8 multiplications to update u(n). Finally, 4N multipli- 
cations are required to adaptively update the transversal filter coefficients. Hence, in 
order to implement the fast LMS-Newton Algorithm 1, we require a total of 14M? + 
43M + 13 + 4N multiplications. The number of the required additions is about the same. 


in a two-channel lattice. Thus, some simplifications that are applicable to single-channel lattice equations are 
inapplicable to the two-channel case. Consequently, direct mimicking of the results of Chapter 11 is not possible 
here. This is why we present a fresh derivation of Algorithms 1 and 2. 
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Typically, M can take a value of 8 and the adaptive filter length N may be 1500, for a 
medium size office room. With these numbers, each update of u(n) would make up only 
17% of the total computational complexity of the AE canceller. 


Algorithm 2: The two-channel LMS-Newton Algorithm 1 is structurally complicated 
despite having reasonably low-computational complexity. Manipulating the data is not 
all that straightforward and hence it is more suitable for implementing using software. 
We now propose an alternate algorithm that is computationally less complex and can be 
easily implemented in hardware. 

If we look at the L matrix given in Eq. (15.116), we observe that only the first 2(M + 1) 
rows of this matrix are uniquely represented. The remaining rows are just the delayed 
versions of the (2M + 1)th and (2M + 2)nd rows given by 


[-Gy, —Gyo + -Gum LO +s 0] 


As discussed in Chapter 11, for the single-channel case, Algorithm 1 may be greatly 
simplified by extending the input and tap-weight vectors x(n) and w(n), to the followings 
vectors 


Xp(n) = [x (n + M), x(n + M),..., x(n + 1), x(n + 1), x(n), x2 (n), . 
xmn- L+), x(n- L+1),... xmn- L-M +1), x (n-L-M+ ine (15.120) 


and 


wgn) = [wi -m (n), Wo, y(n), ..., Wy-1 (1), w2 (1), Wy on), wz o(n), - 
wi -100), W271), Wry M1M), W rym- 0] (15.121) 


respectively, and applying Eq. (15.114) to update the extended tap-weight vector wg (n). 
We also need to appropriately take care of the dimensions of L and R,,. Moreover, 
because we are interested only in the tap-weights corresponding to w; (7), W2,9(”), 
w; (n), w (n), ---, Wy 7-17), W p"), the first 2M and the last 2M elements of 
the extended tap-weight vector can be permanently set to zero. This will also remove the 
computation of the first 2M and the last 2M elements of L'R;,, Lx, (n). This leads to the 
following update equation: 


w(n + 1) = w(n) + pu, (n)e(n) (15.122) 


where 
u, (2) = LR (n)L xp (n) (15.123) 
Also, L; is a 2(L + M) x 2(L+2M) matrix defined as 
-Gu -Gu °° I Qaz 0 wae 0 0 
0 -Gui © Gum I- 0 win 0 


L, = (15.124) 


0 0 ..-- 0 0--- -Gui © -Gym I 
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and L, isa 2L x 2(L + M) matrix defined as 


I -Gh m sva -Gy 0 vam | eee 0 0 
0 I wie SE -G ser | e 0 

ee , (15.125) 
0 0 Te 0 0 e Ta -G3 =G), i 


On examining Eq. (15.123), we can see that it is only necessary to update the first 2 x 1 

element vector of RL x5 (1) and then the first 2 x 1 element vector of the final result, 

u, (n). The remaining elements will be the delayed versions of these first two elements. 
On the other hand, recall that the forward and backward prediction errors are given as 


fa) =x) — $ An jx — j) (15.126) 

j=l 

and X 
b„ (n) = x(n — m) — X Gin, XM =f (15.127) 


j=! 


respectively. It is well known that in a single-channel lattice g,, ; = am m4i—j, J = 1, 
2, ..., m, and this relationship between the forward and backward predictor coefficients 
was used to derive the single-channel LMS-Newton algorithm 1 in Chapter 11. In a two- 
channel lattice, G, j; = ee or for only j = 1 (refer to Eqs. (15B-6) and (15B-9) in 
Appendix 15B). But on the basis of the perspective gained from extensive experimenta- 
tion, in Rao and Farhang-Boroujeny (2009), it has been found that this relationship also 
approximately holds true for j = 2,3,... Hence, we may write 

G, eA 


m,j m,m+1 


pIE h ye (15.128) 


The main motivation behind introducing this approximation is to use the transposed back- 
ward predictor coefficients in reverse order to estimate the forward prediction errors, 
as was done in the case of single-channel in Chapter 11. Consequently, we rewrite 
Eq. (15.126) as 


f,,,(n) © £,,(n) = x(n) — Y Gh myx — J) (15.129) 


j=l 


From Eqs. (15.124) and (15.127), we recognize that the filtering of the input vector 
x(n + M) through a backward prediction-error filter to obtain b(n + M) is equivalent to 
evaluating L;xg (n). The backward prediction-error vector b(n + M) is normalized with 
R; - (n + M). This will give us an update of R; (n + M)L,x,(n). We then use the nor- 
malized backward prediction-error vector R; (n+ M)by(n + M) as an input to a filter 
whose coefficients are the transposed duplicates of the backward prediction-error filter in 
reverse order. We recognize from Eqs. (15.125) and (15.129) that this filter turns out to be 
the forward prediction-error filter, assuming that Eq. (15.128) holds. As a result, the output 
of the forward prediction-error filter will provide us with the samples of the vector u, (n) 
= L,R; (n)Lixg(n). Thus, we can see that the approximation introduced in Eq. (15.128) 
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facilitates the development of an algorithm that can be efficiently implemented on hard- 
ware. At the same time, over a wide range of experiments, this algorithm has been found 
to satisfactorily exhibit the fast converging characteristics of the LMS-Newton algorithm. 

In a practical implementation, we choose the original input to the predictor filter to be 
x(n) and not x(n + M). To account for this delay, the desired signal d(n) is delayed by 
M samples to be time aligned with u,(7). This will result in a delayed LMS algorithm 
whose performance is very close to its nondelayed version when M « L. Moreover, 
as the power terms are assumed to be time invariant over the length of the prediction 
filters, the normalization block is moved to the output of the forward prediction-error 
filter. Accordingly, we formulate Algorithm 2 as: 


1. Run the lattice predictor of order M using Eqs. (15.98), (15.99), (15.102), and (15.103) 
to obtain the reflection coefficients and the backward prediction-errors. 
2. Compute the elements of u,(7) that are the delayed versions of its first two elements. 


Ua, (1) = Ua, C 1; j=L-1,L-2,...,1 
Ua (2) = Ua, (N 1); j=L-1,L-2,...,1 


3. Run the lattice predictor of order M (reflection coefficients have already been computed 
in Step 1) with b(n) as the input to obtain the forward prediction-error vector fj, (n). 

4. Compute the first two elements of u,() to be the first two elements of f(n) premul- 
tiplied with the 2 x 2 normalization matrix R, fe Bie (n). 

5. Finally, compute the adaptive filter output $ (n) = w'(n)x(n), the error signal e(n) = 
d(n) — (n) and update the filter taps using Eq. (15.122). 


This particular version of the LMS-Newton algorithm is computationally less intensive 
when compared to Algorithm 1. To implement the lattice predictor, we require 25M +5 
multiplications. Updating u,(7) using the forward prediction-error filter requires a further 
8M +8 multiplications. If we include the adaptive transversal filter update, Algorithm 
2 requires a total of 33M + 13 + 4N multiplications. Thus, for M = 8 and N = 1500, 
updating u,(n) constitutes only 4% of the total complexity of the AE canceller. 
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Appendix 15A: Multitaper method 


Multitaper method is a nonparametric spectrum analysis technique. It may be thought 
as a generalization of the celebrated periodogram spectral estimation. In this appendix, 
we start with a brief review of the periodogram spectral estimation method. The short- 
comings of the periodogram spectral estimation method are then highlighted, and the 
multitaper method is introduced as a way of overcoming these shortcomings. Efficient 
implementation of the multitaper method, using polyphase structures, is also reviewed for 
completeness of the discussion. 


Periodogram Spectral Estimation 


The periodogram spectral estimator is the most basic and simplest member of the class 
of nonparametric spectral estimators. It obtains an estimate of the spectrum ®,,(e/®) of 
a random process x(n), based on M samples of one realization of it, as 


2 


M-1 
Êa (e7) =| Yo Ayyx(n—&) (15A.1) 
k=0 
where {x(n —k),k = 0,1,..., M — 1} is the sample set, h; „ = w,e/", and w, is a 


window function. Clearly, if w,’s are chosen such that they are the coefficients of a 


finite-impulse response low-pass filter, h;,, will be a band-pass filter with the center 


frequency œw;. Also, if we choose œw; = 27i/L,i =0,1,..., L — 1, the set of filters A; ,, 
define a L-band filter bank with the prototype filter h, = w,. 
Substituting h;, = w,e/%" and @, =27i/M (ie. assuming that M = L) in 


Eq. (15A.1) and defining w,x(n — k) = ux, we obtain 


M-1 2 


Qn 
X j upel mik 


k=0 


(ce!) = (15A.2) 


One may note that the summation on the right-hand side of Eq. (15A.2) is the DFT of 
the sequence u,. Moreover, assuming that M is properly chosen, this summation can 
be efficiently implemented using an FFT algorithm. This, in turn, implies that the filter 
bank associated with the periodogram spectral estimator can be efficiently implemented 
by weighting the input samples, using the weighting function w,, and applying an FFT 
to the windowed samples. 

In its simplest form w, = 1, for n = 0, 1,..., M — 1. This is a rectangular window 
that is characterized with a sinc magnitude response. The sinc pulse is not desirable for 
spectral estimators. This is because its relatively large side lobes result in a significant 
spectral leakage among different frequency bands. By replacing the rectangular window 
with a window function that smoothly decays on both sides (a taper), a prototype filter 
with much smaller side lobes is obtained. There exists a wide range of window functions 
from which one may choose. Among them Hamming, Hanning and Blackman are the 
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most popular and widely used window functions. They are, respectively, defined as 


; 27n 
Hamming: w, = 0.54 — 0.46 cos Mol (15A.3) 
2 
Hanning: w, = 0.5 — 0.5 cos di (15A.4) 
M-1 
2xn 4rn 
Blackman: w, = 0.42 — 0.5 cos + 0.08 cos (15A.5) 
M-1 M-1 


The magnitude of frequency responses of these window functions, for M = 32, along with 
that of rectangular window, are presented in Figure 15A.1. Comparing the responses, we 
find that (i) the rectangular window has the narrowest main lobe (equal to 2/M) while its 
side lobes are largest in magnitude; (ii) Hamming and Hanning windows achieve much 
lower side lobes at the cost of a wider main lobe (equal to 4/M); (Gii) Blackman window 
further improves the side lobes at the cost of further expansion of the width of the main 
lobe (equal to 6/M). 

The above window functions are very limited in controlling the width of the main lobe 
and the size of the side lobes of the frequency response. The size of the side lobes and 
the width of the main lobe are determined by the window type, and once a window type 
is selected, one can control only the width of the main lobe by changing the window 
length M. Moreover, the spectral sample estimates ®,, (e/%), fori =0,1,...,M—1, 
obtained according to Eq. (15A.1) are very coarse, because no averaging is applied. 
Clearly, the estimates can be smoothen, by averaging samples from successive blocks of 
x(n). However, this results in some latency in the estimates, which, in the case of double- 
talk detectors, is undesirable because, to avoid possible divergence of the AE canceller, 


20 1 
rectangular 
— — -Hamming 
Oo; : 
Hanning 
—-— Blackman 


MAGNITUDE, dB 
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> 
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Figure 15.4.1 Frequency responses of various window functions for M = 32. The responses are 
normalized to a peak of 0 dB. 
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one wishes to detect any double-talk as soon as it starts. The multitaper uses multiple 
windows over the same block of the input signal, say, hence, the name multitaper, and thus 
allows smoothening the power spectral (cross-power spectral) estimates with a minimum 
latency. Furthermore, the multitaper method follows a systematic method in selecting a 
set of optimum window functions, called prolate sequences. 


Prolate Sequences: The Optimal Window Functions 
for Multitaper Method 


The process of choosing a window w, with a target main lobe width Af and minimizing 
the side lobes may be formulated as the following optimization problem: 


Given a bandwidth Af , design a low-pass FIR filter of length M whose main lobe is within 
the range (—Af/2, Af/2) and has the minimum stopband energy. 


The coefficients of the designed filter, w„, constitute a sequence that is called prolate. It 
may be also viewed as a window function and thus referred to as prolate window. 

Moreover, one may extend the above optimization and set the goal as finding a set 
of K > 1 orthogonal window functions (hence, the name multitaper) whose main lobes 
are within the range (—Af/2, Af/2) and whose stopband energies are minimized. One 
may realize that this optimization problem can be solved using the minimax theorem of 
Chapter 4, by taking the following steps: 


e Let x(n) be a random process with power spectral density 


1 —Af/2 < f = Af/2 


(15A.6) 
0 otherwise 


o(f)= 


Construct the M-by-M correlation matrix R of x(n). This is the symmetric Toeplitz 
matrix whose first row consists of the correlation coefficients of x(n). These are 
obtained as 

b(k) = Afsinc(Afk), fork =0,1,...,M—-1 (15A.7) 


e The first K eigenvectors of R, that is, qo, q1» - - - qg—1, are the desired prolate sequences. 


The window functions obtained according to this procedure are often called prolate 
sequences. The term Slepian sequences is a synonym to the prolate sequences; hence, 
both are used interchangeably. 

Let us elaborate. We recall from Chapter 4 that Ay = E [lqgx(n)|?] is the output power of 
an FIR filter with the coefficient vector qg. Moreover, according to the minimax theorem, 
Qo is selected to maximize the output power of this filter. On the other hand, when x(n) is 
chosen to satisfy Eq. (15A.6), using the Rayleigh’s relation (Chapter 2), one will find that 


0.5 
= -@(f)d 
ho ma I. lO( Mr e(fPdf 


Af/2 y 
= max (Ad 
max f 10o ~Paf 
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1-Af/2 : 
= ] — min d 15A.8 
min [ jn TOPAS (5A8) 


where the last identity follows from the Paseval’s identity 4540 = K |Oo(f Pdf =1. 
Accordingly, one may find that qo is the coefficient vector of a filter whose stopband 
energy f Ape /2 lOo(f )|[?d f is minimized; that is, it attains the maximum attenuation of 
the side lobes. 

Following the same argument, q, will also be the coefficient vector of a low-pass filter 
whose stopband begins at A f/2 and achieves the minimum stopband energy, subject to the 
constraint qatqı = 0. Clearly, this will result in a filter whose stopband attenuation will not 
be as good as that of qo. Proceeding further, one finds that subsequent filters, q2, q3,..., 
will experience more loss in their stopband attenuation because of more constraints. 

From the above discussions, the prolate sequences define the coefficients of a set of 
prototype filters with certain optimal properties. In particular, the good stopband behavior 
of these filters makes them a desirable candidate in the application of nonparametric 
spectral estimation, particularly, in applications where a wide spectral dynamic range is 
required. It is also worth noting that the orthogonality condition imposed on the prolate 
filters is to assure that under the condition where ®,.,.(e/”) variation over each subband is 
negligible, the set of outputs from various filter banks that correspond to the same subband 
will be uncorrelated. Hence, averaging the energy/cross-correlation of the signals from 
the filter banks results in power spectral/cross-power spectral estimates with a minimum 
variance. 


An Example of Prolate Sequences 


The numerical example given here is to provide a more in-depth understanding of the 
properties of the prolate sequences as a set of multitaper window functions. We choose 
M = 64 and set Af = 1/L, where L = 8 is the number of points (approximated by 
frequency bands) at which power/cross-power spectra of the underlying signals has to be 
estimated. We also choose the first K = M/L = 8 prolate sequences obtained according 
to the procedure mentioned above. 

Figure 15A.2 presents the magnitude responses of the filters associated with these 
sequences. This figure reveals the following facts. 


Only the first few filters have good stopband attenuation. 

Thomson (1982) has identified the number of useful prolate filters as K — 1. 

e As is evident from the results of Figure 15A.2, the stopband responses of the prolate 
filters deteriorate very fast in the higher numbered filters. Here, q, attains a stopband 
response of —20 to —25 dB. The Kth prolate filter q} has a stopband response of less 
than —20dB. 


The number of K — 1 prolate filters, predicted by Thomson, is a soft limit. For some 
cases, the use of the first K prolate filters may be also acceptable. Experiments that we 
have carried out for double-talk detection have convinced us that the prediction made by 
Thomson is also applicable for a fair estimate of c,,(e/°). The results that are presented 
in Figures 15.10 and 15.12 are based on K prolate filters. 
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Figure 15A.2 Magnitude response of a set of prolate sequences/filters. 
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The results presented in Figure 15A.2 have been obtained using the MATLAB pro- 
gram “prolates.m,” available on the accompanying website. Interested readers are 
encouraged to run this program for different choices of the parameters to see their effects. 


Polyphase Filter Banks 


According to the discussion in Section 15.3.3, an implementation of the multitaper method 
requires realization of a multiple set of filter banks. An example of such filter banks was 
presented in Figure 15.14. We also recall that a filter bank is naturally implemented based 
on a prototype filter. Here, we consider a filter bank with a prototype filter 


M-1 
H(z) = H(@) = Do a" (15A.9) 
n=0 


Here, we are interested in implementing the set of filters 


M-1 
H(z) = Y hp Ez" (15A.10) 


n=0 


L-point 
IDFT 


Lp—1(n) 


Figure 15A.3 Polyphase filter bank. 
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for m = 0, 1,..., L — 1. The set of filters (15A.10) can be implemented efficiently in a 
common structure, called polyphase. To develop such a polyphase structure, we assume 
that K = M/L is an integer, let 


n=kL+l (15A.11) 
and note that the range O < n < M — 1 will be covered by letting k = 0, 1,..., K —1 
and l = 0, 1,..., L — 1. Substituting Eq. (15A.11) in Eq. (15A.10), we obtain 
L-1 
Hy, (2) =Y eT Eee (15A.12) 
1=0 
where 
K-1 
E,(2) = Yo Ay iz* (15A.13) 
k=0 


is the /th polyphase component of H(z). 

Using Eq. (154.12), one will find that the set of the transfer functions H,,,(z), for 
m=0,1,...,2— 1, can be jointly implemented using the polyphase structure presented 
in Figure 15A.3. One may note that the computation of each set of output samples in this 
structure involves M multiplications and (approximately) M additions and one L-point 
IDFT operation. Clearly, the latter may be implemented efficiently using an FFT. Finally, 
we note that in the implementation of a multitaper, the prolate sequences qo, qı, ... are 
used as the prototype filters, and a number of independent filter banks are implemented. 
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Appendix 15B: Derivation of the Two-Channel 
Levinson—Durbin Algorithm 


If bo(2), bj), ..., bz_;(n) forms an orthogonal basis set for x(n), x(n — 1), ..., 
x(n — L + 1), then the backward prediction-error vectors are given by 


m 


b,, (2) = x(n — m) — 5 Gn ;x(n— j +1) (15B.1) 
j=l 


Similarly, the forward prediction-error vectors are given by 
m 
fa) =x) — D> An x(n — j) (15B.2) 
j=l 
We also recall the update equations 
fn41(2) = fn (2) — m4 bp — 1) (15B.3) 


and 
bp 0) = bpn — 1) — Kn 41 f, (2) (15B.4) 


respectively. 
Using Eqs. (15B.1) and (15B.2), we can expand Eq. (15B.4) as 


m m 


x(n —m—1)— So Gin j41X(M — j)=x(n—m—1)- So Gn x(n — j) 
j=0 j=l 
-= Km k(n) — D7 A, 7x(n j) (15B.5) 
j=l 


On equating the coefficients of x(n — j), we obtain 


Gin+t,1 = Kn+1 (15B.6) 
Gm+1,j+1 = Gn,j = knim j=1,2,...,m (15B.7) 


Similarly, we can work with the forward prediction-errors and use Eqs. (15B.1) and 
(15B.2) to expand Eq. (15B.3) as 


m+1 m 
x(n) — È` Any x — j) =x(n) — DOA, jx — j) 
j=l j=l 
-kl 1(x(2 —m — 1) — So Gn (0 = j)) (15B.8) 


j=l 
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Once again, equating coefficients of x(n — j) will lead to the following result 


Lore eee = kari (15B.9) 
Ani; = Åm j Kh Gm j J=1,2,... m (15B.10) 


Equations (15B.6), (15B.7), (15B.9), and (15B.10) constitute the two-channel 
Levinson—Durbin algorithm that may be used to convert the reflection coefficients to the 
predictor coefficients. 


16 


Active Noise Control 


Acoustic noise occurs in different forms in variety of environments. Clearly, such noises 
are undesirable and methods of reducing them are always sought. Conventionally, the use 
of passive materials to absorb an acoustic noise or contain it within an enclosure has been 
considered. However, these methods generally found unsuccessful is suppressing the low 
frequency portions of acoustic signals. This is because for an absorber to be effective its 
thickness should be comparable or larger than the wavelength of the acoustic signal that 
it meant to be suppressed. Moreover, at low frequencies, the acoustic wavelength grows 
very fast. For instance, while, in air, the wavelength of a 5 kHz acoustic signal is 6.8 cm, 
it increases to 3.4m at 100 Hz. 

The term active noise control (ANC) refers to the methods where acoustic (and, also, 
hydroacoustic) noise signals are canceled using a combination of electromechanical sys- 
tems. ANC methods are built based on the fact that an acoustic signal/pressure can be 
nullified by introducing an acoustic pressure of opposite direction. To put this in a right 
prospect and also to be able to discuss the advantages as well as the limitations of ANC 
methods, in this chapter, we present two (important) examples of ANC systems. 

The first case of interest is the problem of noise cancellation in air ducts, for example, 
air conditioning ducts and exhaust pipes in cars. This case is often exemplified by the noise 
cancellation setup that was presented in Figure 1.20 and for convenience of reference is 
repeated here in Figure 16.1. 

The second case that we consider is when a primary acoustical source has generated 
a signal (noise) that reaches a point P in space with a pressure of p(t). A microphone 
measures this pressure and through an ANC filter instructs a secondary source (a canceling 
loudspeaker) at a point near P to broadcast the same signal, but with an opposite sign. 
This concept is presented in Figure 16.2. We note that when the primary source is at 
some location far away from P, the cancellation will be limited to the points near P, that 
is, a silent zone around the error microphone is generated. This is because the acoustic 
pressure resulting from the secondary source vanishes in space relatively fast, while that 
of the primary source remains nearly intact at the points surrounding P. This may be 
understood, if one notes that the sound pressure resulting from a source in free space at 
a distance r is proportional to r—!. 

The setups presented in Figures 16.1 and 16.2 are different in a number of ways. While 
the setup in Figure 16.2 is that of an acoustic noise cancellation in a three-dimensional 
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Figure 16.1 ANC in an air duct. 
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Figure 16.2 ANC surrounding a point P. 


space, the one in Figure 16.1 may be thought of as a noise cancellation in a one- 
dimensional sound field along a pipe/duct, that is, a one-dimensional space. As a result, 
in Figure 16.1, once the acoustic signal/pressure is suppressed at a point along the duct, it 
will be also suppressed at all points to the right-hand side of that point. The ANC setup in 
Figure 16.2, on the other hand, is only able to generate a silent zone surrounding the point 
P. The setup in Figure 16.1 consists of a canceling loudspeaker, a reference microphone, 
and an error microphone. The setup in Figure 16.2, on the other hand, consists of only 
a canceling loudspeaker and an error microphone. There is no reference microphone in 
Figure 16.2. Accordingly, the setup in Figure 16.1 may be categorized as an open loop 
system in which the signal to the canceling loudspeaker is generated by filtering the sig- 
nal picked up by the reference microphone. The feedback from the error microphone is 
used to evaluate the residual error that will be used for the adaptation of the ANC filter. 
In Figure 16.2, on the other hand, the signal picked up by the error microphone, which 
partly comes from the canceling loudspeaker, is the input to the ANC filter that derives 
the canceling loudspeaker. This is clearly a closed loop system. 
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In the rest of this chapter, we elaborate on the details of the implementation of the open 
loop ANC setup of Figure 16.1 as well as the closed loop ANC setup of Figure 16.2. We 
also introduce the multichannel ANC systems where multiple reference/error microphones 
and canceling loudspeakers are used to improve the performance of ANC systems. The 
multichannel extension of Figure 16.2, in particular, is important as it allows expansion of 
the size of the silent zone. Many acoustic noises originate from rotary electromechanical 
systems, such as motors, engines, compressors, fans, and propellers. Such systems lead 
to an acoustic noise that is periodic, hence, consists of a fundamental sine wave and 
its harmonics. ANC systems that are designed to cancel sinusoidal/periodic noises are 
referred to as narrowband, because of obvious reasons. Other types of ANC systems are 
referred to as broadband. 


16.1 Broadband Feedforward Single-Channel ANC 


Broadband feedforward single-channel ANC systems are often exemplified by the ANC 
in an air duct, that is, the case presented in Figure 16.1. Figure 16.3 has repeated the same 
figure and has added to it the pertinent transfer functions related to the system design. The 
acoustic path between the points near the reference microphone and the error microphone 
is denoted by P(z). This we refer to as the primary path. There are also two secondary 
paths between the canceling loudspeaker and the reference and error microphones, denoted 
as S,(z) and S,(z), respectively. The ANC filter has to be adapted so that the sum of 
the acoustic signals at the error microphone vanishes to zero. We also note that although 
in practice the underlying acoustic paths are analog in nature, here, we have chosen to 
present them with the discrete transfer functions P(z), S;(z), and S5(z), for consistency 
of our presentation with the rest of the chapters in this book as well as the notational 
convenience. 
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Figure 16.3 Details of ANC in an air duct. 
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16.1.1 System Block Diagram in the Absence of the Secondary 
Path S7(z) 


At this stage of the system development, to simplify the discussion, the secondary 
path from the canceling loudspeaker to the reference microphone, S)(z), is assumed 
to be absent. We add this and accordingly modify the proposed ANC system later, in 
Section 16.1.4. 

Figure 16.4 presents a block diagram of the broadband feedforward single-channel 
ANC system of Figure 16.3, in the absence of S)(z). The adaptive filter W(z) denotes 
the transfer function of the ANC filter. The error microphone is denoted by a summing 
node whose inputs are d(n) and y(n). This arrangement is slightly different from the 
conventional adaptive filter structures that have been presented so far in this book. We 
note that this has no significant practical implication since with this new arrangement, the 
tap weights of the adaptive filter W(z) converge to the negative of their respective values 
if a minus sign was added to the left-side input of the summing node. 

One may realize that if the secondary path S,(z) is removed from Figure 16.4, it reduces 
to the system modeling problems that were introduced in the various parts of the previous 
chapter, for example, Figure 6.5. In that case, the error signal e(n) vanishes to zero, if 
W (z) is chosen such that W (z) + P(z) = 0, or if we let 


W(z) = —P(z) (16.1) 


The presence of S,(z) in Figure 16.4 leads to a different modeling problem whose 
solution, for reducing e(n) to zero, should satisfy the following equation 


W(z)S3(z) + P(z) =0 (16.2) 
Solving Eq. (16.2) for W(z), we obtain 


(16.3) 


In practice, Eq. (16.3) may not be a feasible solution. First, because 1/S,(z) may not 
be a stable system. Second, even when 1/S,(z) is a stable system, because of the reasons 
discussed in Chapter 3, the choice of a recursive filter for W(z) is usually undesirable. 


Figure 16.4 A simplified block diagram of the air duct ANC when the secondary path S,(z) is 
removed. The input x(n) is the noise signal at the reference microphone. 
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Hence, in practice, the solution (16.3) is approximated by an (FIR) filter that may be 
adapted using variety of the algorithms that were discussed in the previous chapters. 
However, we note that the presence of the transfer function $,(z) between the adaptive 
filter W(z) and the error signal e(n) requires a careful revisit and modification of any 
adaptive algorithm that one selects to use. Such development for the case of the LMS 
algorithm is presented next. 


16.1.2  Filtered-X LMS Algorithm 


Filtered-X LMS algorithm refers to a version of the LMS algorithm that is obtained by 
applying the latter to the modeling problem of Figure 16.4. The choice of the prefix 
filtered-X will become evident as we go through the development of the algorithm. 

As in the case of the conventional LMS algorithm, we begin with the instantaneous cost 
function E(n) = = e° (n) and use this in the implementation of the steepest-descent update 


w(n + 1) = w(n) — uVe? (n) (16.4) 
We note that, here, 
e(n) = d(n) + y(n) 
= d(n) + wT (n)x' (n) (16.5) 


where w(n) is the tap-weight vector of the adaptive filter W(z) at the time instant n, 
and the second line is obtained by noting that y(n) is obtained by passing x(n) through 
the cascade of W(z) and S,(z), swapping the order of W(z) and S,(z), and defining 
x'(n) as a filtered version of x(n) obtained from passing x(n) through S,(z). Clearly, the 
prefix filtered-X conveys this rearrangement of the underlying signal sequences. Moreover, 
assuming that the ANC filter has a length of N, we follow the convention of the previous 
chapters and define the filtered-input and tap-weight vectors 


x(n) = [x'(n) x (n— 1) -x(n — N + D]! (16.6) 


and 
w(n) = [wo(n) w (n) «+» wy (n)]? (16.7) 


respectively. 
Substituting Eq. (16.5) in Eq. (16.4), the filtered-X LMS update is obtained as 


w(n + 1) = w(n) — 2ue(n)x' (n) (16.8) 


Moreover, the block diagram presented in Figure 16.5 depicts the above developments in 
a clear-to-understand form. In this diagram, S,(z) is an estimate of S,(z) that may also 
be obtained through an LMS algorithm, before the activation of the ANC filter, W(z). 


16.1.3 Convergence Analysis 


Convergence behavior of the filtered-X LMS algorithm is very similar to that of the 
LMS algorithm. To provide some insight, we note that under a slow adaptation, that 
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Figure 16.5 Detail of implementation of the filtered-X LMS algorithm. 


is, when the step-size parameter u is sufficiently small (e.g., when jw is selected to 
achieve a misadjustment of 10% or less), over any short period of time W(z) may be 
assumed to be a linear and time-invariant system. Thus, if the duration of the impulse 
response of $,(z) can be considered as short, one may swap the blocks W (z) and S5(z) in 
Figure 16.5. Moreover, if we assume that (z) = S,(z), Figure 16.5 may be rearranged 
as in Figure 16.6. 

Considering Figure 16.6, we have an adaptive filter with the input signal x’(n) and a 
desired output d (n). From the discussion in Chapter 6, one may recall that the convergence 
behavior of the LMS algorithm is mostly determined by the statistics of its input signal. 


x(n) 


Figure 16.6 An equivalent to the block diagram of Figure 16.5. 
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More specifically, the convergence behavior of the LMS algorithm is controlled by a 
number of time constants whose values are given by 


1 
T: = 
"Aud; 


, fori=0,1,...,N-—1 (16.9) 


where à;’s are the eigenvalues of the correlation matrix R of the tap inputs of the adaptive 
filter. Here, the input to the adaptive filter is x’(n) and, thus, 


R = E[x (n)x""(n)] (16.10) 


According to the analysis presented in Chapter 6, one may say that a sure convergence 
of the LMS algorithm is guaranteed if 


<u (16.11) 


go 

3tr[R] 
Although this range of u works pretty well in most cases of ANC systems, for some cases 
the upper limit of the range, | /3tr[R], may not be tight enough. Going back to Figure 16.4, 
one may note that the presence of the secondary path between the ANC filter and the error 
microphone may introduce a significant delay in the adaptation loop and such delay, like 
in any feedback loop, can be a source of instability. In the context of ANC adaptation, 
this may mean the instability (i.e., divergence) of the filtered-X LMS algorithm. Hence, 
when the delay introduced by S,(z) is significantly large, one may reduce the upper bond 
of the stability range (16.11). Alternatively, by positioning the error microphone closer to 
canceling loudspeaker, one may reduce the delay in the adaptation loop and thus reduce 
the chance of instability of the filtered-X LMS algorithm. 

Yet, the ANC systems, in general, suffer from a number of other problems that should 
be carefully considered in their design and implementation. The noise signal x(n) picked 
up by the reference microphone may be highly colored. More coloring will be added 
to it after passing through the secondary path estimate $ (2). This in turn implies that 
the correlation matrix R = E [x (nx T (n)] whose eigenvalues determine the modes of 
convergence of the filtered-X LMS algorithm, may have a number of eigenvalues that 
are close to zero. Hence, filtered-X LMS algorithm may converge very slowly or diverge 
away from the desired optimum solution within the subspace that corresponds to the near 
zero eigenvalues of R. This problem may be resolved by replacing the update equation 
(16.8) by its leaky version 


w(n + 1) = Bw(n) — 2ue(n)x' (n) (16.12) 
where 6 is constant smaller than, but close to, 1. An interested reader may refer to 


Section 15.5.2 for a detailed discussion of the leaky LMS algorithm. 


16.1.4 Adding the Secondary Path S)(z) 


Figure 16.7 presents a complete block diagram equivalent to Figure 16.3. This is an 
improved diagram over the one presented in Figure 16.4, where the secondary path S; (z) 
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Figure 16.7 An equivalent to the block diagram of Figure 16.5. 


was ignored. The transfer function between the input x(n) and the output error e(n) may 
be evaluated and shown to be 


E(z) W(z) 
—_—=P M 16.1 
xo” *i-s,@wo oie 
Letting e(n) = 0, thus, E(z) = 0, and solving Eq. (16.13) for W(z), we obtain 
Wes (16.14) 


P(z)S\(z) — S$5(z) 


As discussed in the case of Eq. (16.3), the realization of the ANC filter W(z) of 
Figure 16.7 in an IIR form as predicted by Eq. (16.14) is also problematic. Hence, an FIR 
solution should be sought. However, unfortunately, unlike Eq. (16.3), the approximation 
of Eq. (16.14) by an FIR is not possible because of the following reason. In practice, the 
canceling loudspeaker and the error microphone are placed at some distance away from 
the reference microphone. The error microphone, on the other hand, is placed at a point 
near the canceling loudspeaker. This arrangement results in high order transfer functions 
for the primary path P(z) and the secondary path S,(z), and a significantly lower order 
transfer function for the secondary path S,(z). Therefore, one may note that while the 
denominator of the right-hand side of Eq. (16.3) contains a low order polynomial, the 
denominator of the right-hand side of Eq. (16.14) contains a significantly higher order 
polynomial. This, in turn, implies that while the transfer function (16.3) can be reasonably 
approximated by an FIR filter, this may not be possible for the transfer function (16.14), 
unless a very high order FIR filter is used. To resolve this problem, the following solution 
has been adopted in most of the ANC systems. An estimate of the secondary path Sj (z), 
say, ký | (Zz), is obtained and used to generate a model of S4 (z) and remove its effect digitally 
as presented in Figure 16.8. One may note that this is very similar to the acoustic echo 
cancellation setup that was presented in Chapter 15. 

One may note that when Si (z) = Sı (z), the block diagram presented in Figure 16.8 
reduces to the one in Figure 16.4, hence, all the development subsequent to the presen- 
tation of the latter figure are also applicable to the former. In particular, the addition of 
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Figure 16.8 Block diagram showing how the effect of the secondary/feedback path S,(z) 
is removed. 


the filtered-X LMS to Figure 16.8 is a straightforward task. Doing so, and replacing the 
acoustic blocks in Figure 16.8, by the actual components (the duct, the loudspeaker, and 
the microphones), a complete ANC system for air ducts is obtained. This is presented in 
Figure 16.9. 

The ANC system presented in Figure 16.9 has two modes of operation: (i) offline 
processing and (ii) online processing. In the offline mode, the input to the canceling 
loudspeaker comes from a broadband noise source and the secondary paths S,(z) and 
S,(z) are estimated using an adaptive algorithm, for example, an LMS algorithm. Once 
the convergence of $ ,(z) and Ss (z) are complete, the ANC is switched to the online mode 
where the adaptation of the ANC filter W(z) begins. 


16.2 Narrowband Feedforward Single-Channel ANC 


In many applications of ANC, the underlying noise sources and, thus, the generated noise 
signals are periodic. Such periodic noise signals are generated by rotary machines, such as 
engines, motors, fans, and compressors. Recalling from the Fourier series theory that any 
periodic signal can be expanded as a summation of a number of sine waves, consisting of 
a fundamental and its harmonic components, one may adopt the adaptive filtering structure 
presented in Figure 16.10 which, without any loss of generality, is given for ANC in an air 
duct. A tachometer/sensor detects the fundamental period/frequency of the noise source. 
This information is passed to a signal generator to generate a periodic signal with the 
same fundamental frequency as the source. The generated signal is passed to an adaptive 
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Figure 16.9 Complete ANC system for an air duct. 


filter (the ANC filter) which shapes its spectrum to match that of the noise amplitude and 
phase (with an opposite sign) at the canceling loudspeaker. The error microphone picks 
the residual noise and instructs the adaptation of the ANC filter. 

Comparing Figure 16.10 with its broadband counterpart in Figure 16.1, one may observe 
the following fundamental differences. In Figure 16.1, the broadband noise is picked up 
by the reference microphone. In Figure 16.10, on the other hand, the period/frequency of 
the periodic noise is obtained through a nonacoustic device. A reference signal is then 
generated accordingly. In both cases (Figures 16.1 and 16.10), the ANC filter shapes 
the spectrum of the reference signal such that a correct anti-noise is generated at the 
canceling loudspeaker. However, the feedback from the canceling loudspeaker to the 
reference microphone, which can be a major source of instability, has no counterpart 
in the case of narrowband ANC systems. Hence, the narrowband ANC systems are less 
prone to instability problems and thus are easier to design and implement. 


16.2.1 Waveform Synthesis Method 


The waveform synthesis method constructs a sign-inverted version of the periodic noise, 
viz., the anti-noise, at the canceling loudspeaker through the following mechanism. 
The ANC filter is chosen to be an FIR filter with a length equal to one period of the 
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Figure 16.10 Adaptive filtering structure for narrowband ANC in an air duct. 


noise/anti-noise. It is exited by a sequence of impulses at a spacing equal to one period 
of the anti-noise, and its coefficients are adjusted to converge toward the samples of the 
anti-noise. This concept is presented in Figure 16.11. 


Analysis with the Secondary Path Excluded 


To gain some insight to the operation of the waveform synthesis method, we ignore the 
secondary path between the canceling loudspeaker output and the error microphone and 
accordingly present the equivalent block diagram of Figure 16.12. We assume that the 
periodic noise has a period of N samples. Hence, at the time instant n, the ANC filter 
transfer function may be written as 


N-1 
We) = Do win) (16.15) 
i=0 
The input to W(z) is 


1, n = any integer multiple of N 
x(n) = 


; (16.16) 
0, otherwise 


The desired signal d(n) in the adaptive filtering setup of Figure 16.12 is the noise signal 
that reaches the error microphone. The adder presents the error microphone. 

The setup presented in Figure 16.12 has the following interesting property. When the 
LMS algorithm is used to adapt the coefficients of W(z) and the block diagram presented 
in Figure 16.12 is treated as a systems with input d(n) and output e(n), it behaves like 
a linear time-invariant system. We prove this by showing that d(n) and e(n) are related 
through a difference equation with a set of time-invariant coefficients. 

We first note that the LMS recursion in Figure 16.12 is obtained as 


w(n + 1) = w(n) — 2pe(n)x(n) (16.17) 
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Figure 16.11 Waveform synthesis method applied to a narrowband ANC in an air duct. 


Figure 16.12 An equivalent diagram of Figure 16.11 when the secondary path between the can- 
celing loudspeaker output and the error microphone is ignored. 


Next, we note that for a given n, 


x(n) = (16.18) 


O = 
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where the term “1” is at the ith position and i £ i(n) = (n modulo N). This implies that 
at the time instant n only the i(n)th element of w(n) will be updated, hence, Eq. (16.17) 
may be written as 
Wim + 1) = Wim) — 2ue(n) (16.19) 
On the other hand, recalling 
N-1 


y(n) = Yo w(n)x(n) (16.20) 


i=0 
and using Eq. (16.18), one will find that 
y(n) = Wim (n) (16.21) 


Moreover, we note that the updated tap weight w; (n + 1) of Eq. (16.19) remains 
unchanged until the time instant n + N and, hence, 


y(n + N) = Wimm n + N) = Wiman + 1) (16.22) 
Using Eqs. (16.21) and (16.22) in Eq. (16.17), we obtain 
y(n + N) = y(n) — 2ue(n) (16.23) 


Moreover, considering the fact that d(n) + y(n) = e(n) and d(n + N)+ y(n +N) = 
e(n + N), Eq. (16.23) may be rearranged as 


d(n + N) —d(n) = e(n + N) — (1 — 2u)e(n) (16.24) 
Finally, replacing n by n — N in Eq. (16.24), we obtain 


d(n) —d(n— N) = e(n) — (1 — 2u)e(n — N) (16.25) 


This is the linear difference equation that relates d(n) and e(n). 
Taking the z-transforms of both sides of Eq. (16.25) and rearranging the result, we 

obtain 

E) — 1-7” 


AOS T=- 


(16.26) 


This is the transfer function that relates the original noise, d (n), and its suppressed version, 
e(n). H(z) has N zeros at the equally spaced angle positions 0, 27/N, 47/N, ..., 
2(N — 1)z/N on the unit circle, |z| = 1, and N poles at the same angles, but on the 
circle |z| = ¥/1 — 2u. These poles and zeros are presented in Figure 16.13, for N = 8. 
The half-power/3 dB bandwidth of each null is given by 


BW ~ 2(I = V1 2) (16.27) 


The reader may refer to the discussion regarding Eq. (16.52), in the following, for deriva- 
tion of a similar equation. 

Figure 16.14 presents a plot of the magnitude response of H(z), when N = 8 and 
u = 0.4. As seen, H(z) is a multinotch filter with N notch frequencies in the interval 
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Figure 16.13 Zeros and poles arrangement of H(z), for N = 8. 
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Figure 16.14 An example of magnitude response of the multinotch filter H(z), for N = 8 and 
p= 0.2. 


0 < w < 2x. Also, the step-size parameter u determines the bandwidth of each notch, 
according to Eq. (16.27). A smaller jz results in a narrower band notch. 

In some applications, it is desirable to keep some residual of the noise; that is, noise can- 
cellation should be purposefully kept imperfect. This can be easily achieved by replacing 
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the LMS recursion (16.17) by its leaky counterpart, viz., 
w(n + 1) = Bw(n) — 2pe(n)x(n) (16.28) 


where £ is a constant close to, but smaller than 1. Starting with Eq. (16.28), and following 
a similar line of derivation to those that led to Eq. (16.26), we obtain 


E) 1-£p2% 
D(z) 1-—( = 2y)z-% 


To understand the impact of the leakage parameter 6 on the magnitude response of 
H(z), we note that for z = e/® and values of w, = 2kx/N, fork = 0, 1,..., N — 1, 
l= p 


|H(e/%)| = —— (16.30) 
2u 


A(z) = (16.29) 


When £ = 1, that is, when the conventional LMS recursion (16.17) is used, | H (e/*)| = 0. 
That is, the nulls are perfect. On the other hand, when 6 < 1, the gain of the nonideal nulls 
is given by Eq. (16.30). Clearly, to have meaningful nulls, the condition |H(e/?*)| « 1 
or, equivalently, 

1-B <2p (16.31) 


should hold. 


Analysis with the Secondary Path S,(z) Included 


Consider the case where the secondary path between the canceling loudspeaker and the 
error microphone, S,(z), is taken into account and the filtered-X LMS algorithm is used 
for the adaptation of W(z). Following the same line of argument to the one that led to 
Figure 16.6, here, we obtain the equivalent block diagram of Figure 16.15. Also shown 
in Figure 16.15 are the power spectral density ®,, (e/”) of the signal generator output (a 


signal 
generator 


Figure 16.15 An equivalent block diagram of the narrowband ANC in an air duct when the 
secondary path S,(z) is considered. 
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periodic set of impulses; see Figure 16.11) as well as the power spectral density ®,,,/(e/”) 
of the secondary path output when it is subject to the input x(n). In time domain, the 
latter will be a periodic sequence with the period of N samples. 

Now comparing Figure 16.15 with Figure 16.12, one will find that while the input to 
W (z) in the latter has a flat spectra at all of its frequency components (the fundamental and 
harmonics), in Figure 16.15 the input to W(z) is spectrally shaped by the secondary path, 
S,(z). This variation of the spectral components of the input to W (z) has an impact on the 
transfer function H(z). Recalling from Chapter 6 that the variation of the signal power at 
the input of an LMS-based adaptive filter is equivalent to scaling its step-size parameter 
with the input signal power, one may intuitively argue that the transfer function H(z) in 
the case of Figure 16.15 is obtained from Eq. (16.26) by replacing u with pS, (z)S,(z7'). 
This leads to 


nas = saa (16.32) 
D(z) 1 — 1 = 2wS4(z)S83(z7!))z7-% 
Note that when z varies on the unit circle 
1—e No 
H(e!®) = (16.33) 


L= (1 = 2y|S,(€7) ewe 


The transfer function (16.32) can also be obtained through a mathematical derivation. 
One such method starts with the ANC block diagram presented in Figure 16.16a. Assum- 
ing that $ (z) = S,(z), the block diagram shown in Figure 16.16a can be rearranged as 
in Figure 16.16b. Next, we note that as in the case of Eq. (16.21), here, 


yn) = Wi) (n) (16.34) 
Moreover, 
Winn +N) = Wimm n +N) 
= Wim (n) — 2u (e(n) x» s(n)) (16.35) 
where 
s(n) = s2 (n) x s2 (=n) (16.36) 


and s,(n) is the inverse z-transform of S,(z). Substituting Eq. (16.34) in Eq. (16.35), 
we obtain 
y'(n + N) = y'(n) — 2u(e' (n) x s(n)) (16.37) 


Moreover, noting that y’(n) = e'(n) — d' (n), Eq. (16.37) may be written as 
e'(n + N) — d'(n + N) = e' (n) — d' (n) — 2u (e' (n) x s(n)) (16.38) 
Taking z-transform on both sides of Eq. (16.38) and recalling Eq. (16.36), we obtain 


z^ E' (2) — z™ D'(z) = E' (z) — D' (z2) — 2u E' (z)S3(2)S3 (271) (16.39) 
Moreover, from Figure 16.16b, one may observe that 


_ E(z) 
Sp (Z) 


E’(z) (16.40) 
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i 
signal e(n) 
generator 


signal 
generator 


Figure 16.16 (a) Filtered-X LMS algorithm applied to a narrowband ANC system and (b) a 
rearrangement of the system when S,(z) = S, (z). 


and 
_ DR) 


S) 
Substituting Eqs. (16.40) and (16.41) in Eq. (16.39), one obtains Eq. (16.32). 


To develop a more in-depth understanding of the effect of the secondary path S, (z) 
on the performance of the multinotch filter H(z) of Eq. (16.32), Figure 16.17 presents a 


D'(z) (16.41) 
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Figure 16.17 An example of magnitude response of the multinotch filter H(z), for N = 8 and 
u = 0.2, and in the presence of a secondary path S, (z) = 0.5 + 0.5z7!. 


counterpart of Figure 16.14, when a secondary path S3 (z) = 0.5 + 0.5z7! has been added. 
To explain the magnitude response plot of Figure 16.17, we note that, here, 


|S,(e/”)|? = cos?w (16.42) 


Moreover, we note that cos*w is equal to 1 at œ = 0 and decreases as w varies between 
O and z. It reaches a minimum of zero at œ = m, and increases toward 1, as œ varies 
between z and 2x. The variation of S,(e/”) has the following impact on the magnitude 
response of H(z). Near zero, where |S, (e/ )|? ~ 1, there is very little difference between 
the plot in Figure 16.17 and its counterpart in Figure 16.14. On the other hand, as w 
approaches zr, thus, |S5(e/”)|? decreases, the bandwidths of the notches decrease. At 
w = 1, where |S>(e/”)|* = 0, the notch disappears from the response. This observation 
shows that when the secondary path S,(z) has a null at any of the harmonics of the noise, 
the ANC system discussed in this section is incapable of removing that harmonic. 


Synchronization 


The underlying assumption made in the above derivations was that the rate of the input 
samples to the ANC filter W(z) was synchronized with the fundamental frequency of 
the noise such that each period of the noise was exactly equal to N samples. Such 
synchronization can be achieved in practice by a proper design of the tachometer. The 
tachometer should be designed so that it generates N pulses per each full rotation of 
the noise generating rotor, for example, by placing N equally spaced sensors (magnets) 
around the shaft of the rotor. 
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16.2.2 Adaptive Notch Filters 


In the cases where each period of the noise is long (i.e., it has a low fundamental 
frequency), hence, the number of the taps in the ANC filter W(z) is large, waveform 
synthesis method may be too complex to implement. In such cases, an alternative solu- 
tion that directly synthesizes the fundamental and harmonic components of the noise 
may be the preferred choice. Figure 16.18 presents this solution for the case where the 
noise is a single tone at the frequency wọ. A two-tap linear combiner with the tap inputs 
xo(n) = cos(@pn) and xı (n) = sin(@pn) and the coefficients wọ(n) and w,(n) is used to 
synthesize a sign-reversed replica of the tonal noise of the acoustic signal 


d(n) = acos(@p + $) + v(n) (16.43) 


In Eq. (16.43), a and @ are, respectively, the amplitude and phase of the tonal noise 
and v(n) is a broadband process/noise. Clearly, the setup presented in Figure 16.18 is 
only capable of canceling the tonal component of d(n), leaving its broadband component, 
v(n), intact. 

The adder on the acoustic side of Figure 16.18 refers to the error microphone. The error 
signal e(n) from the microphone is fed back to the left side of the figure for adaptation 
of the tap weights wọ and w,. Upon convergence of the tap weight wọ and wy, they 
approach their optimal values 


Woo = 4 COS $ (16.44) 


and 
Wio = —asing (16.45) 


This setting of wọ and w, results in the residual error e(n) = v(n). Thus, as in the case 


of waveform synthesis method, the setup in Figure 16.18 acts like a notch with a perfect 
notch at w = wọ. In fact, a careful analysis of the signal flows in Figure 16.18, presented in 


d(n) = acos(wo + ¢) + v(n) 


xo(n) = cos(won) 


x1(n) = sin(won) 


acoustic domain 


digital domain 


Figure 16.18 A notch filtering setup with the notch frequency of wọ. 
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Widrow et al. (1975), reveals that d(n) and e(n) are related through the transfer function 


1 — 227! cos wy + z7? 


H = 
SRT- onre 


(16.46) 


An alternative derivation of the same result can be found in Glover (1977); see also Elliott 
and Nelson (1993). In Appendix 16A, a proof of Eq. (16.46) that follows the work of 
Glover (1977) is presented. 

The transfer function (16.46) has a pair of zeros at 


z = et (16.47) 


which implies that H(z) has a perfect notch at the angular frequency w = w . The poles 
of H(z) are located at 


z= (1 — u) cos wy + jy (1 — 2p) — (1 — u)?cos2op (16.48) 
These poles are inside the unit circle at a radial distance 


r= J1-—2u (16.49) 


from the center of the unit circle. They are also located at the angular position 


l—p 
6 = +cos~! | ——— cos w 16.50 
( oi o) ( ) 
Since typically, in practice, u « 1, and for such cases the approximation y1 — 2u ~ 
1 — u holds, one will find that r ~ 1 — u and 0 ~ +a, hence, Eq. (16.48) reduces to 


z& (1 — ujeti (16.51) 


The zeros and poles given in Eqs. (16.47) and (16.51), respectively, are presented in 
Figure 16.19a. Also shown in this figure is the half-power bandwidth of the notch bands. 
This concept is further clarified in Figure 16.19b, where an example of the magnitude 
response of H (z) is presented over the range 0 < w < x. The half-power/3 dB bandwidth 
of the notch filter (16.46) is seen from the presented diagrams to be 


BW © 2u (16.52) 


Adding the Secondary Path S,(z) 


When the secondary path S,(z) is included in Figure 16.18, the filtered-X LMS algorithm 
should be used for adaptation of the tap weights wọ and w,. Moreover, recalling the 
filtered-X LMS algorithm, x (7) and x,(n) should be passed through the estimate S (2) 
of the secondary path S,(z) to generate the respective filtered signals x(n) and x| (n) 
which subsequently will be used for adaptation of the tap weights wọ and w,. On the 
other hand, as xọ(n) and x,(n) are sinusoidal signals, xj(m) and x(n) can be obtained 
from xg(n) and x,(n), respectively, by introducing a change in amplitude and a phase 
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(1 — »)H|H(e*)| 


(b) 


Figure 16.19 (a) Location of zeros and poles of the transfer function (16.46) and (b) its magnitude 
response. 


shift. The phase shift can be realized by delaying the signals xọ(n) and x(n), say, by 
A samples, where A is equal to the ratio of the desired delay in seconds divide by 
T, (the sampling period), rounded to the nearest integer. The amplitude change can be 
compensated for by adjusting the step-size parameter of the LMS algorithm; see Chapter 6 
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d(n) = acos(wp + ¢) + v(n) 


Figure 16.20 Block diagram of a notch filter with the secondary path S,(z) included. 


for the relationship between the step-size parameter in the LMS algorithm and the signal 
level at the adaptive filter input. 

Figure 16.20 presents a block diagram of a notch filter setup with the secondary path 
included, and the filtered-X LMS algorithm used for adaptation of the filter coefficients. In 
this figure, we have also used a delay of ô samples to generate sin(wọn) from cos(wgi). 
One may note that the assumption xọ(n — ô) = xı (n) may not be accurate, unless the 
equation wô = 7 is satisfied for some integer 5. However, we note that a slight mismatch 
of xọ(n — ô) and the optimum choice of x(n) = sin(wọn) has no significant impact on 
the performance of the ANC system. 

In Appendix 16B, the relationship between d(n) and e(n) of Figure 16.20 has been 
studied, and it has been shown when a6 = 5 holds, d(n) and e(n) are related through 
the transfer function 


1 — 2z7! cos wy + z7? 


16.53 
— 2z7! cos wg + z7? + 2z7!8,(z)(z7! cos(@p — 8) — cos A) ( ) 


A(z) = i 


where 0 = œA. 


Extension to Multiple Notches 


The block diagram presented in Figure 16.20 can be extended to introduce multiple 
notches in the response between d(n) and e(n). Such an extension for the case where 
the notches are introduced at the frequencies wọ and œ; is presented in Figure 16.21. 
Extension to the cases with more than two notch frequencies follows the same concept 
and thus is obvious. 
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xo(n) = cos(won) 


d(n) = acos(wo + ¢) + v(n) 


x1(n) = cos(wın) 


Figure 16.21 Block diagram of a double notch filter with the secondary path S,(z) included. The 
notch frequencies are at wọ and @). 


16.3 Feedback Single-Channel ANC 


Consider the feedback ANC system of Figure 16.2, and let the acoustical link between the 
canceling loudspeaker and the error microphone be denoted as $(z). This leads to the block 
diagram of Figure 16.22. This is a feedback system with the open loop transfer function 


__¥@ 
G@) = E(z) 


= —W(z)S(z) (16.54) 
and the closed loop transfer function 


E(z) 
D(z) 
1 
1+ G(z) 
1 


—— (16.55) 
1 — W(Z)S(z) 


H(z) 
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Figure 16.22 Block diagram of a feedback single-channel ANC system. 


From the theory of control systems, we recall that to avoid possible instability of the 
system presented in Figure 16.22 one should make sure that if at any frequency the phase 
of G(e/”) reaches 180°, |G(e/)| remains strictly smaller than 1. Ideally, if one can 
choose W (z) such that the phase of G(e/”) remains around zero for all values of «, it 
will be assured that the ANC will be a strictly stable system. In that case, to reduce e(n) 
to a value close to zero, all one needs to do is to add a real and positive high gain to 
W(z). This should be obvious since, according to Eq. (16.55), 


|E(e!”)| = Pee (16.56) 
|1 + G(e/®)| 


However, the task of reaching the condition Z(G(e/®)) ~ 0, for all values w, may not 
be achievable in practice. 

The most common method of handling the feedback ANC problem is based on the 
following philosophy/intuition. The noise signal d(n) that we wish to reconstruct a sign- 
reversed version of it at the point y(n) in Figure 16.22 is not available. On the other hand, 
we may recall from the theory of Wiener filters (particularly, the principle of correlation 
discussed in Chapter 3) that to predict a good estimate of d(n) from a process x(n), x(n) 
should be highly correlated with d(n). But, we just said d(n) is not available. This puzzle 
can be solved by adopting the setup presented in Figure 16.23. This setup assumes that 
an estimate § (z) of S(z) is available and passes the output of W (z), y(n), through this 
estimate to obtain an estimate j(n) of y(n). Moreover, noting that 


d(n) = e(n) — y(n) (16.57) 
an estimate d (n) of d(n) is obtained by subtracting }(n) from e(n). So, 


x(n) = d(n) 
= e(n) — y(n) (16.58) 


Active Noise Control 575 


Figure 16.23 A setup for implementation of the feedback ANC system of Figure 16.22. 


Analysis 


We may note that when d (n) is a good estimate of d(n), to reduce the residual error e(n), 
W (z) has to be set such that 
W(z)S(z) = —1 (16.59) 


This will lead to y(n) = —d(n) and, thus, e(n) = d(n) — d(n) will be a small error signal. 
One may also note that Eq. (16.59) implies that W (z) = -5p This setting of W (z) may 
not be possible in practice as io may be a noncausal transfer function. Hence, in practice, 
one has to resort to an approximation W (z) © aa Thus, noise suppression in feedback 
ANC systems will usually be a partial one. 

More insight can be developed by deriving the transfer function H(z) between d(n) 


and e(n). To this end, we first note that 


Ye W&) 


= = (16.60) 
E(z) 14+ W(z)S(z) 


Next, 


W(z)S(Z) 


= (16.61) 
1+ W(z)S(z) 
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Finally, we obtain 


1 
A(z) = 1-G@ 
E 1+ W@S@) (16.62) 
1+ W(z)(S(z) — S(z)) 
When S(z) = S(z), Eq. (16.62) reduces to 
A(z) = 1+ W(z)S(z) (16.63) 


Moreover, if 1/S(z) is stable and causal, and W(z) is set equal to —1/S(z), perfect 
suppression of d(n) will occur. As the stability and causality of 1/S(z) usually is not 
satisfied in practice, a compromise choice that minimizes the residual error e(n) should 
be found. The filtered-X LMS algorithm (discussed below) searches for such a solution. 


The Inverse Modeling Points of View 


In the ANC system presented in Figure 16.23, in the ideal case where $ (z) = S(z), one 
finds that d(n) = d(n) and accordingly Figure 16.23 simplifies to Figure 16.24. Note that 
in this figure, we have also changed the order of the cascade of W(z) and S(z) to S(z) 
and W(z). 

One may compare Figure 16.24 with Figure 3.9 and note that this is an inverse modeling 
(or, channel equalization) problem. Moreover, as the inverse of S(z) may not be a causal 
system, the desired inverse modeling solution W(z) = —1/S(z) may only be attainable 
to a certain degree. Typical noise reductions that have been reported in the literature are 
usually in the order of 10-15 dB. 


Filtered-X LMS Algorithm 


Application of the filtered-X LMS algorithm to adaptation of W (z) in Figure 16.23 is a 
rather straightforward task. Figure 16.25 presents a block diagram of the filtered-xX LMS 
algorithm when applied to the feedback ANC system of Figure 16.23. As in the case 
of the feedforward ANC systems, here also it is assumed that the estimate Ky (z) of the 


secondary path S(z) is obtained offline. 
yn etn 


d(n) 


Figure 16.24 The equivalent bock diagram to Figure 16.23 when S(z) = S(z). 


Active Noise Control 577 


d(n) 


a(n) = d(n) 


= 
B 
y 
a 
P 
2 
3 
v 


Figure 16.25 Filtered-X LMS algorithm implementation of the feedback ANC system of 
Figure 16.23. 


16.4 Multichannel ANC Systems 


The single-channel ANC systems that were introduced in the previous sections of this 
chapter only work when the noise cancellation is limited to a narrow air duct (like an 
engine exhaust) or in a small zone in the vicinity of the error microphone (e.g., in the 
case of feedback ANC systems of Figure 16.2). In many applications, where the volume 
in which the noise has to be suppressed is large (including wide air ducts), it may be 
necessary to adopt ANC systems that use multiple reference microphones/sensors, multiple 
canceling loudspeakers, and/or multiple error microphones. An air duct exemplifying this 
scenario is presented in Figure 16.26. 

Figure 16.26 may be viewed as a dual of Figure 16.9. The difference is that here all the 
signals are vector signals, thus, the lines connecting different blocks/transfer functions are 
drawn thicker. The transfer functions connecting different signal vectors have coefficients 
that are matrices. For example, assuming an N coefficient ANC filter W(z), it will have 
the following form: 


Wo) = >) Wiz! (16.64) 


where W,’s are the coefficient matrices of size M, x M,, and M, and M, are the number 
of reference microphones and the number of canceling loudspeakers, respectively. The 
error signal e(n) is an M, x 1 vector and the filtered-X LMS algorithm minimizes the 
cost function 


& = E[e'(n)e(n)] (16.65) 
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Figure 16.26 Multichannel ANC system for a wide air duct. 


16.4.1 MIMO Blocks/Transfer Functions 


The ANC filter W(z) and other blocks in Figure 16.26 are referred to as multi-input 
multi-output (MIMO) systems. To gain a more clear understanding of a MIMO system, 
we present the two-input two-output system presented in Figure 16.27. There are four 
links that connect the two inputs xọ(n) and x(n) and the two outputs yọ(n) and y(n). 
Let these links be expressed by the transfer functions 


Yo (z) 


0t Z do + booz! (16.66) 
Xo) 00 00 


Hoz) = 
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Ho (z) = Te = agı + boz! (16.67) 
wz) = aa = ay) + biz! (16.68) 
H) = ne =a +b! (16.69) 
From these, one will find that 
y(z) = H(z)x(z) (16.70) 
where x(z) = P y(z) = Ba and 
nosio ic] len al * [o bn) a 


16.4.2 Derivation of the LMS Algorithm for MIMO Adaptive Filters 


To develop an insight to the understanding of how an LMS algorithm may be developed 
for minimization of the cost function (16.65), here, we present the development of an 
LMS algorithm for the adaptive filtering setup that is presented in Figure 16.28. Once 
this is understood, it should be a straightforward task for an interested reader to develop 
similar algorithms in different contexts, including LMS algorithms for offline adaptation 
of $, (z) and $, (z) as well as a filtered-X LMS algorithm for adaptation of the ANC filter 
W(z) in the system setup of Figure 16.26. 


Figure 16.27 Block diagram of a 2 x 2 MIMO system. 
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Figure 16.28 A multi-input multi-output adaptive filter. 


The input x(n) to the adaptive filter W(z) is a vector of size M; x 1. The output y(n) of 
W(z) and the desired output d(7) are vectors of size M, x 1. Accordingly, the coefficients 
W; of W(z) are matrices of size M, x M;. Moreover, 


N-1 
yin) = D> W;x(n — i) (16.72) 
i=0 
Also, 
e(n) = d(n) — y(n) (16.73) 
To proceed, we note that the cost function (16.65) may be expanded as 
Mo-! 
=) k (16.74) 
k=0 
where 
& = Elek] (16.75) 


and e,(n) is the kth element of e(n). Moreover, we note that 
e(n) = d (n) — y(n) (16.76) 


where d,(n) and y(n) are the kth elements of d(n) and y(n), respectively. In addition, 
from Eq. (16.72), one may find that 


N-1 
y(n) = > W’x(n — i) (16.77) 
i=0 


where Ww denotes the kth row of W;. Substituting Eq. (16.77) in Eq. (16.76), and the 
result in Eq. (16.75), one will find that minimization of é, involves optimization of the 
k rows of the coefficient matrices W,’s. This, in turn, implies that the subcost functions 
&, in Eq. (16.74), for each k, can be optimized independently, because each depends on 
a separate set of the elements of W;,’s. 

Before proceeding with the derivation of an LMS algorithm for minimization of &,, it 
is also instructive to derive an expression for é. From the above discussions, one may 
note that 

e,(n) = d,(n) — WFX (n) (16.78) 
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where A r : 
w = (WY wW... we 7 (16.79) 
and 
x(n) 
x(n — 1) 
x(n) = p (16.80) 
x(n- N +1) 


Note that the vectors w, and x(n) are column vectors of length N M;. 
One may now note that Eq. (16.78) has the same form as Eq. (3.7). Hence, following 
a line of derivations similar to those in Chapter 3, one will find the optimum value of w, 
that minimizes &, as E 
Wro = RP (16.81) 


where R = E[xX(n)x!(n)] and p = E[x(n)d,(n)]. Also, the LMS algorithm for the adap- 
tation of w, is trivially obtained as 


w,(n + 1) = wy (n) — 2e, (n)x(n) (16.82) 


Repeating Eq. (16.82) for k = 0, 1, ..., M, — 1, we will have the update equations for 
all of the coefficients of W(z). 
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Appendix 16A: Derivation of Eq. (16.46) 


Using the LMS algorithm to adapt the coefficients wọ and w, in Figure 16.18, we have 
the following update equation for wy: 


Wo(n + 1) = wo(n) — 2ue(n)xo(n) (16A.1) 
Substituting 
Jwon —Ja@on 
Xo(n) = coson) = — (16A.2) 


in Eq. (16A.1), we obtain 

wo(n + 1) = wo(n) — pe(n)(e/" + e7790”) (16A.3) 
Taking the z-transform on both sides of Eq. (16A.3), we get 

zWo(z) = Wo(z) — w(E(ze 4) + E(ze/)) (16A.4) 


where we have noted that wo(n+ 1) + zWo(z), e(n)e/°0" <5 E(ze~/0") and 
e(nye 1°" < E(ze/”"), Equation (16A.4) may be rearranged as 


W(z) = =- Ege) + E(ze/”)) (16A.5) 


Similarly, using the LMS update equation of w, and recalling the identity 


e/@on _ eT J@on 


x(n) = sin(@pn) = ~a (16A.6) 
J 
we obtain ¥ j Beal 
—J%0) — JO 
Wiz) = ee r Cadin (16A.7) 
z= J 
On the other hand, 
y(n) = wo(n) cos(@pn) + w (n) sin(@gn) (16A.8) 


Taking z-transform on both sides of Eq. (16A.8), and recalling Eqs. (16A.2) and (16A.6), 
we obtain 


Wo(ze/) + Wo (ze!) mM (zeJ0) — W, (zel) 


Yk) = 2 2j 


(16A.9) 
Using Eqs. (16A.5) and (16A.7), after some straightforward manipulations, one will find 
that Eq. (16A.9) converts to 

—2uz cos wo + 2u 
z? — 2cos w +1 


Finally, noting that y(n) = e(n) — d (n), hence, Y(z) = E(z) — D(z), Eq. (16.46) is 
obtained from Eq. (16A.10). 


Y@)= E(z) (16A.10) 


Active Noise Control 583 


Appendix 16B: Derivation of Eq. (16.53) 


Consider the ANC system presented in Figure 16.20. Let @ 9d = 4, thus the lower input 
to the linear combiner in Figure 16.20 will be exactly equal to sin(wọn). Also, we let 
@jA = 0. These lead to the following update equations for updating wọ and w: 


wo(n + 1) = wn) — pe(n)(e/e!? + e72 eI?) (16B.1) 


and 
wi(n + 1) = w (n) + je(n)(e/ "el? — eI) (16B.2) 


respectively. Starting with Eqs. (16B.1) and (16B.2) and following the same line of 
derivations that led to Eq. (16A.10), here, we obtain 


—2z COS(@) — 8) + 24 ćos0 


Y'(z) = E(z) (16B.3) 


z? — 2coswg + 1 
We also note from Figure 16.20 that 
Y (z) = Y'(z)S)(z) (16B.4) 


Substituting Eq. (16B.3) in Eq. (16B.4) and recalling that Y(z) = E(z) — D(z), one will 
arrive at Eq. (16.53). 
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Synchronization and Equalization 
in Data Transmission Systems 


Channel equalization is one of the most widely explored topics in the field of adaptive 
filters. The earliest form of adaptive filters was suggested in the 1960s as the develop- 
ments on digital modems began. A significant portion of the theory of adaptive filters in 
those early days was developed in the context of adaptive equalizers and then applied to 
other applications. In addition, the adaptive equalizers had their own specific problems 
that had to be tackled separately. For instance, synchronization and time alignment of the 
receiver with the transmit signal was not a trivial task at the beginning. It was almost a 
decade after the invention of the adaptive equalizers that the more practical methods of 
synchronization (such as cyclic equalizers discussed in Section 17.6) were developed. It 
also took a number of years before those working on the channel equalizers could appreci- 
ate the advantages of fractionally spaced equalizers over their symbol-spaced counterparts 
(Section 17.4). Moreover, further advancement in the theory of adaptive equalizers led 
to blind adaptation methods that would allow adaptation of equalizer coefficients with- 
out transmitting any pilot/training symbols. The capacity achieving maximum-likelihood 
detectors and soft equalizers have also been introduced. Another important development 
is an elegant signaling format that allows equalization in the frequency domain. These 
varieties of equalization methods and the synchronization steps that are necessary to allow 
their successful operation are presented in some detail in this chapter. 

In this chapter, we need to bounce forth and back between the continuous and discrete 
time signals. We differentiate a continuous time signal and its discrete time counterpart 
using the argument “t” (or t) to denote continuous time and the argument “n” (or m) 
when reference is made to a discrete time index. For example, whereas we use x(t) 
to represent a continuous time signal, the samples of x(t) at the time instants t = nT,, 
where T, is the sampling interval, are represented by the sequence x(n). Of course, this is 
a misnomer of notations; however, as long as we remain consistent in the use of notations, 
there should be no confusion. As for the frequency domain signals, the Fourier transform 
of a continuous time signal x(t) is denoted by X(Q). For a discrete time sequence, we 
keep the notation that was used in the previous chapters, that is, X(e/“) denotes the 
discrete time Fourier transform of the sequence x(n). In other words, whereas Q is used 
to denote the frequency of a continuous time sine-wave x(t) = sin(Qt), œ is used to 
denote the normalized frequency (with respect to the sampling frequency f, = 1/T,) of 


Adaptive Filters: Theory and Applications, Second Edition. Behrouz Farhang-Boroujeny. 
© 2013 John Wiley & Sons, Ltd. Published 2013 by John Wiley & Sons, Ltd. 


Synchronization and Equalization in Data Transmission Systems 585 


its sampled version x(n) = sin(QnT,) = sin(wn). One may also note that the absolute Q 
and the normalized w frequencies are related according to the equation 


me: 
E 


w (17.1) 


17.1 Continuous Time Channel Model 


In the study of adaptive equalizers, the channel is often modeled in its equivalent baseband 
and discrete time form, even though the actual transmission happens over a passband radio 
frequency (RF) and in continuous time form. In this section, we present a development 
of such a channel model. 

Figure 17.1 presents the block diagram of a digital quadrature-amplitude modula- 
tion (QAM) communication system. The transmit data symbols, s(n), are streamed to 
a sequence of impulses at the interval of T. We refer to T as symbol-space. This is 
presented by the continuous time signal 


s(t) = J s(n)8(t — nT) (17.2) 
n 
where s(n) are data symbols from a QAM constellation. 

In Figure 17.1, the sequence of the symbol modulated impulses, represented by the con- 
tinuous time signal s(t), is passed through a transmit pulse-shaping filter with the impulse 
response pr(t). This results in a band-limited baseband signal, which subsequently mod- 
ulates a carrier with the frequency Q,. Multiplication by e/*’ shifts the baseband signal 
spectrum to the carrier band, and the block ‘R[-] takes the real part of the result, thus 
produces a duplicate image spectrum around — Q, as well. This leads to 


Xgam (D = S(Q — Qe) Pp(Q — Q.) + S*(—Q — Q.) Pr(-Q — 2e) (17.3) 
where X gam (S2) is the Fourier transform of xoam(t). Using Eq. (17.3), we obtain 
Xoam(Q) = (S(Q — Q,) Pp(Q — 2.) + S*(-—Q — Q,) P#(—Q — Q,)) CQ) 074 


where Xo Am (S2) and C(&2) are the Fourier transforms of x AmM(t) and c(t), respectively. 
At the receiver side, the multiplication of XQ Aam(t) by e~/**' shifts the spectrum to the 


x(t) 
prt) ——> 
transmit Channel receive 
filter filter 


Figure 17.1 Block diagram of the channel model in a digital QAM communication system. 
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left, by Q., and the receive filter removes the other portion of the signal spectrum that is 
now located around —2&,. These steps lead to 


X(Q) = S(Q)H(Q) (17.5) 


where 
(82) = P(Q) Pe (Q)VC(Q + Q,) (17.6) 


This shows that the combination of all the blocks in the transmission path from the input 
s(t) to the output x(t) is characterized by the transfer function H (Q) given by Eq. (17.6). 

We also note that the common choice for both the pulse-shaping filters p(t) and pp(t) 
is the square-root raised-cosine pulse shape 


gy L Sin (C=) #) + F cos ((1 + 0) F) 
Pstre = 


T 


(17.7) 


where the parameter 0 < œ < 1 is referred to as roll-off factor. The Fourier transform of 
Psrre (t) is 


VT, R] < Ge 
P..(0) = 14/F cos E (i21 - cous) ), ios aioe oe (17.8) 
0, otherwise 


(1+a)2 (+a) 


Note that P,,,,(€&2) spans over the frequency band -=A < Q < = 
In the absence of channel (or when the channel is ideal), H(Q) = Pr(Q2)PR (Q) = 
|P (2)? = P..(Q), where P(Q) is referred to as raised-cosine pulse shape. It is 


straightforward to obtain, from Eq. (17.8), 


T |Q| < “oe 
PD = 4 F {1 +c08 | F (o — SGP) |}, Se <iq) <P 179) 
0, otherwise 
Moreover, in the time domain, 
je (17.10) 
= sinc — ; 
Pre J= 4æ?t /T? 


where sinc(t/T) = . Note that when the channel is ideal and the received signal 
is sampled at the correct timing phase, h(t) = p(t) = 1 at t = 0 and is zero for any 
other integer multiple of T; hence, it is a Nyquist pulse and thus results in an intersymbol 
interference (ISI) free transmission. The presence of channel usually results in an h(t), 
which is not Nyquist and thus equalization is needed to remove ISI. 

Using h(t) to denote the inverse Fourier transform of H(Q), Figure 17.1 simplifies to 
Figure 17.2. Now considering Eq. (17.2), one will find that 


sin(zt/T) 
mt/T 


x(t) = Y= s(n)h(t — nT) (17.11) 


n 
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Figure 17.2 The simplified diagram of the channel model of a digital QAM communication 
system. 


In the next section, we use Eq. (17.11) to develop a discrete time channel model that will 
be used in the rest of this chapter. However, before proceeding with the next section, it 
will be instructive to present an example of a multipath wireless channel. 


Example 17.1 


A multipath passband communication channel that operates at the carrier frequency | GHz 
has two paths with the gains of 1 and 0.5 and the delays of 0 and 0.75 us, respectively. 
The transmit and receive filters are square-root raised-cosine pulse shapes designed for 
T = 1 us anda = 0.5. 


(i) Derive an equation for the equivalent baseband channel transfer function H (Q). 
(ii) Find H(&2) if the delay of the second path changes to 0.75025 ws. 
(iii) In both cases present the plots of the magnitude and phase response of H (Q). 
(iv) In both cases obtain and plot the impulse response of the baseband equivalent of 
the passband channel. 


Solution: 


(i) The impulse response of the channel is c(t) = 6(t) + 0.56(t — 0.75). Hence, C(Q) = 
1+ 0.5e7/° 75°. Substituting this in Eq. (17.6), we obtain 


H(Q) = (1 + 0.5e7? PAL) P (Q) = (1 + 0.5e7 497%) P (2) 


where P,,(&2) is the raised-cosine pulse shape (17.9) with T = 1 us and a = 0.5 
and we have noted that, for R, = 21 x (10° MHz), ei? 5% = 1, 
(ii) Following the same line of derivation as in (i), in this case, we obtain 


H(Q) = ad + O56 799" g 18) p (Q) 
(iii) The plots are shown in Figure 17.3. As seen, a small change in one of the path 
delays can significantly affect the baseband equivalent of the channel response. 
(iv) Taking the inverse Fourier transforms of the results in (i) and (ii), we obtain the 


following: 
for case (i), A(t) = pet) + 0.5p,,(t — 0.75) 
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Figure 17.3 The equivalent baseband channel response of two examples of a two-path wireless 
channel, in the frequency domain. 


and 
for case (ii), A(t) = p(t) — 0.5j p(t — 0.75025) 


These are plotted in Figure 17.4. Interesting to note here is that whereas case (i) 
leads to a real-valued impulse response, case (ii) has a complex-valued impulse 
response. We note that, in general, h(t) is expected to be a complex-valued function 
of time. Another important observation here is that a relatively small change in 
the delay of one of the paths has a profound impact on the response of equivalent 
baseband channel both in the time and frequency domain. 


Generalizing the above example, if the channel is a multipath channel that is charac- 
terized by the impulse response 
M-1 


c(t) = > a;ô(t — q;) (17.12) 


i=0 
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Figure 17.4 The equivalent baseband channel response of two examples of a two-path wireless 
channel, in the time domain. 


where M is the number of paths, and a; and q; are the ith path gain and delay, respectively, 
the baseband equivalent of the channel is obtained as 


M-1 


A(t) = > ae Tp (t — q) (17.13) 
i=0 


17.2 Discrete Time Channel Model and Equalizer Structures 


Consider the case where the continuous time signal x(t) is sampled at a rate L times faster 
than the symbol rate 1/T, and use x(n) to denote the sample value of x(t) at t = nT/L. 
Also, using h(n) to denote the sample value of h(t) at t = nT/L, one will obtain from 
Eq. (17.11) 

x(n) = $ s(m)h(n — mL) (17.14) 


m 


Moreover, if we assume that the channel contains some additive noise (which we ignored 
in the presentations of Figure 17.1, for simplicity of the derivations), one may modify 
Eq. (17.14) as 
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x(n) = XO s(m)h(n — mL) + v(n) (17.15) 


m 


where v(n) is the channel noise. 

In a digital communication system, the samples x(n) are passed through an adaptive 
equalizer whose coefficients are adjusted to undo the distortion caused by the channel 
and thus deliver (almost) ISI free estimates of s(n) at its output. Considering the channel 
equation (17.15), and adding the equalizer after the channel model, the discrete-time 
system model of a digital communication system is obtained as the one presented in 
Figure 17.5. The interpolator block adds L — 1 zeros after each symbol s(n), changing 
the sample rate from 1/T to L/T. Passing the upsampled sequence through H(z) and 
adding the channel noise, v(m), is to implement Eq. (17.15). The equalizer W(z) is an 
FIR filter with the tap weights wo, w,, ..., Wy_,, and the estimates s(n — A) of the 
transmitted data symbols, where A is the delay caused by the combination of the channel 
and equalizer, are obtained by taking the decimated samples at the output of W(z). 


17.2.1 Symbol-Spaced Equalizer 


If one chooses to use an equalizer whose tap spacing is equal to the spacing T of the 
transmitted symbols, Figure 17.5 reduces to Figure 17.6. This corresponds to the case 
where the interpolation and decimation factor L is equal to 1. It may be noted that 
the system setup in Figure 17.6 is similar to the equalizer setup that was presented in 
Chapter 6 (Section 6.4.2). Because of obvious reasons, in this setup, W(z) is referred to 
as a symbol-spaced equalizer. 

It is also worth noting that because the transmit signal has a transmission bandwidth 
that is larger than 1/T (e.g., when pr(t) is square-root raised-cosine pulse shape with a 


y(n) 
Ail Ed dp cl 


interpolator equivalent equalizer decimator decision 
baseband channel device 


Figure 17.5 The discrete-time system model of a digital communication system. 


Figure 17.6 A digital communication system with a symbol-spaced equalizer. 
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roll-off factor a, the transmission bandwidth is (1 + a@)/T), the sampled signal x(n) (the 
equalizer input) is subject to aliasing. This, as discussed later (Section 17.4), may result 
in a poor performance if the input signal to the equalizer is sampled at a timing phase 
that results in a significant spectral attenuation within the aliased band. 


17.2.2 Fractionally Spaced Equalizer 


As it was noted earlier, the symbol-spaced equalizers are subject to aliasing and an 
improper choice of the timing phase of the samples to the equalizer may lead to significant 
performance degradation. This problem is avoided by adopting an equalizer whose input 
is sampled at a rate above the Nyquist. Such equalizers are referred to as fractionally 
spaced. As, in practice, 0 < œ < 1, an equalizer tap-spacing T/2 is always sufficient 
to avoid aliasing. However, when œ < 1, one may set the tap-spacing equal to some 
value between T/2 and T. For example, when œ = 0.5, an equalizer tap-spacing 2T /3 
is sufficient to avoid aliasing. Also, in this case, one may prefer the spacing 27/3 over 
T/2, because for a fixed time span of the equalizer, the choice of 27/3 results in 4/3 
(= (2/3)/(1/2)) times less number of tap weights. 

Implementation of a fractionally spaced equalizer with the tap-spacing T/2 is straight- 
forward and follows the block diagram of Figure 17.5, with L = 2. Figure 17.7 presents 
the detail of implementation of this equalizer. The equalizer taps are at the spacing T/2. 
Moreover, as the equalizer output is decimated twofold, the equalizer output needs to be 
calculated only for even values of n. 

When the tap-spacing is a less trivial fraction of T, for example, it is equal to KT/L, 
for a pair of integers K and L, the equalizer detail is only slightly more involved than that 
in Figure 17.7. The detail of a fractionally spaced equalizer with the tap-spacing 27/3 is 
presented in Figure 17.8. This should serve as an example that can be easily extended to 
other cases as well. Here, the samples x(n) are spaced at T/3, and the tap-spacing 2T /3 
is achieved by replacing each single delay block z~! by a doubly delay block z~*. Also, 
samples of output are calculated for values of n that are integer multiple of 3. 


Figure 17.7 Details of a fractionally spaced equalizer with the tap-spacing 7/2. 
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Figure 17.9 Block diagram of a decision feedback equalizer. 


17.2.3 Decision Feedback Equalizer 


The symbol-spaced and the fractionally spaced equalizer structures that were presented 
earlier belong to the class of linear equalizers/filters. The decision feedback (DF) equal- 
izers, on the other hand, belong to the class of nonlinear filters. Figure 17.9 presents a 
block diagram of a DF equalizer. It consists of a feedforward filter Wpp(z) and a feedback 
filter Wep(z). The feedforward filter Wpp(z) is a linear (transversal) filter and can be a 
symbol-spaced or fractionally spaced one. It equalizes the channel so that its response 
up to the decision device has no precursor ISI, but may contain some postcursor ISI. 
The postcursor ISI is canceled by passing the output of the decision device through the 
feedback filter with the transfer function 


M 
Wen (z) = >> wep iz (17.16) 
i=l 


where wpg,; are the feedback tap weights and M is the number of feedback taps. 
When the channel contains some deep fade(s) in its amplitude response, a linear equal- 
izer has to compensate for the fade(s). This, along with amplifying the desired (data) 
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signal, will amplify the channel noise as well. In other words, equalization also leads 
to a noise enhancement, thus, may result in a significant degradation of the receiver 
performance. The DF equalizers take advantage of the degree of freedom provided by 
the feedback filter Wpg(z) and avoid such noise enhancement. Hence, a DF equalizer, 
expectedly, should perform better than its linear equalizer counterpart. However, the DF 
equalizers suffer from another potential problem that sometimes leads to inferior perfor- 
mance when they are compared with their linear counterparts. Any decision error at the 
output of the decision device feeds back through the feedback filter and can result in 
more errors. This phenomenon, which is known as error propagation, is a serious prob- 
lem that in many cases keeps the engineers lukewarm of adopting the DF equalizers for 
their applications. 


17.3 Timing Recovery 


Before the implementation of any equalizer, the received signal should be sampled at a 
rate that is an integer multiple of symbol rate 1/T, (for example, at an interval T, = T/L 
or, equivalently, at a rate f, = L/T), and at a correct timing phase. This process is 
often referred to as timing recovery. As in any adaptive algorithm, a timing recovery 
algorithm is developed based on a cost function whose optimization leads to the desired 
timing information. There are two classes of timing recovery methods: (i) nondata aided 
methods; and (ii) data-aided methods. The discussion in this section is limited to the 
nondata-aided methods. Interested readers should refer to the more detailed texts for a 
broader discussion on the subject of timing recovery; for example, see Farhang-Boroujeny 
(2010) and Meyr et al. (1998). 

The nondata-aided timing recovery algorithms are commonly developed based on a 
cost function that is built based on the following property of the digital data signals. 
The power of the signal samples taken at the symbol interval T is a periodic function 
of the timing phase, t, with the period of T. This power value is defined as a function, 
say, o(T), and t is adapted to maximize p(t). As discussed later, variety of algorithms 
can be devised for this maximization, and such maximization, although not always, in 
many cases leads to a near optimum timing phase for the symbol-spaced equalizers. The 
fractionally spaced equalizers as we shall see later are insensitive to timing phase. 


17.3.1 Cost Function 


Ignoring the channel noise and recalling the equivalent baseband model of Figure 17.2, 


we obtain 
CO 


x(t) = 5 s(n)h(t — nT) (17.17) 


n=—CO 


where s(7)’s are data symbols and T is the baud/symbol interval. The data symbols are 
in general from a complex-valued constellation. 

The key idea in development of nondata-aided timing recovery stems from the following 
observation. If s(m)’s are a set of independent and identically distributed symbols with 
mean of zero and variance of o2, we obtain the ensemble mean-squared/power of x(t) as 
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pit) = EIKOPI= 0 $ he — ary? (17.18) 


n=—CO 


By direct inspection, we note that p(t) is a periodic signal with period of T. It thus can 
be expanded using Fourier series as 


p(t) = 5 pet? ™/T (17.19) 


n=—C 


where p,,’s are the Fourier series coefficients given by 


tf? 
Pr =a / p(tye 227"? dt (17.20) 
0 


Substituting Eq. (17.18) in Eq. (17.20), we obtain 


i aa ` . 
a of X. |A(t —nT)|? | e717 T at 


Pn = 
n=—00 
o > [ 2.—j2nnt/T 
S — nt 
= > |h(t — nT) |e dt 
T n=—-C 0 
2 lore) 
= = IAE eiT dy (17.21) 
T Joo 


where the second line follows by changing the order of the integral and summation, and 
the third line follows by introducing the change of variable t — nT to t and noting that 
the resulting integrals add up to a single integral over the range —oo < t < oO. 

Next, from the theory of Fourier transform, we recall that for any pair of functions x(t) 
and y(t), 


1 
Fixy] = zD * Y(Q) (17.22) 


where F [-] denotes Fourier transform, * denotes convolution, and X (Q) and Y(Q) 
are the Fourier transforms of x(t) and y(t), respectively. Also, we note that in Eq. 
(17.21), f°. h) e77/Tdt is the Fourier transform of |h(t)|? = A(t)h*(t) at Q = 
2xn/T. Hence, substituting x(t) = h(t) and y(t) = h* (t) in Eq. (17.22), and noting that 
F [h*(t)] = H*(—Q), we get 


2 


= L H(Q) + H*(—@)|.4 
Pn = aT a 


2 T Haut o- Zag (17.23) 
— 2T J- T 
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From Eq. (17.23), the following observations are made: 


e 
a= o? T IH(Q) d2 (17.24) 
2T Joo 
is a real and positive number. 
e 
ae a T non (o-2 ao (17.25) 
20T J= T 
and 
B f uo o+ Lag 
P- = aT J T 


o2 lo) 1 

2 s * 

= J H| 2- — | H (d2 
2T J= T 


= pi (17.26) 


In almost all practical channels, the excess bandwidth of the transmit pulse-shaping 
filter is less than 100%; e.g., when pr(t) is a square-root raised-cosine pulse shape 
with a role of factor a < 1. In such cases, H(Q) = 0 for |Q| > 27/T. Using this, one 
finds that p, = 0, for |n| > 1. 


These observations imply that 
p(t) = po + p18? T + pre Pm/T 
20 
= Py + 2|oi| cos | t+ Loy (17.27) 


where |p,| and Zp, are the amplitude and phase of p,, respectively. 
Next, we note that, for a given timing phase t, E[|x(t + nT)|?] = p(T), for Vn. Hence, 
we may define 
p(t) = El|x(t +nT)[?] (17.28) 


as the timing recovery cost function. From Eq. (17.27), it is obvious that p(T) is a periodic 
function of t with a period of T, the maximum value of pọ + 2|e,|, and the minimum 
value of pọ — 2|e,|. Figure 17.10 presents a typical plot of one period of p(t) over the 
interval -5 < T < Z. Either the minima or maxima points of p(t) can provide a reference 
point to which one may choose to lock the symbol clock rate. This, as presented later, 
allows one to develop algorithms that keep the receiver synchronized with the incoming 
data symbols. 


17.3.2 The Optimum Timing Phase 


When the equalizer is a fractionally spaced one, its performance is independent of the 
timing phase. Hence, there will be no optimum timing phase. All one may choose to do 
is to lock to a point of the cost function p(t); the maximum and the minimum points of 


596 Adaptive Filters 


0.5; d 


Figure 17.10 A typical plot of one period of the timing recovery cost function p(T). 


the cost function are usually easier points to lock to. However, when the equalizer is a 
symbol-spaced one, its performance varies significantly with the timing phase t. It turns 
out that often (but not necessarily always) a good estimate of the optimum timing phase 
is obtained by maximizing the cost function p(T). 

When the equalizer is a symbol-spaced one and the signal samples at the equalizer 
input are sampled at a timing phase t, the equivalent discrete time baseband impulse 
response of the channel is given by the sequence 


h(n, t) = h(nT + T) (17.29) 


Let H(e/®, t) denote the Fourier transform of the sample sequence h(n, t), and note that 
H(e/”, t) is the frequency response of the channel. 

Next, we let t = 0 and recall from the theory of sampling that the frequency response 
H (e/®, 0) is related to its continuous time counterpart H (Q), that is, the Fourier transform 
of h(t), as 

w 1 Š w — 2kr 
H(e!”,0) = = 5 a(S) (17.30) 
k=—00 
We also take note of the following points. H(e/®,0) is a periodic function œ with the 
period of 27. Considering the common case where the pulse-shaping filter p;(t) has an 
excess bandwidth of 100% or smaller, only the adjacent terms on the right-hand side of 
Eq. (17.30) overlap. Hence, 


| l -7 
He”, 0) = 7 (# (F)+a(? = =)). for 0< o < 2x. (17.31) 
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Moreover, one may note that changing the timing phase from 0 to t, in the frequency 
domain, has the impact of replacing H(Q) by e/°* H(Q). This, in turn, implies that 


i 1. i —2 
H(e!®, Tt) = a (1 (=) 4 g- Pa? By (2 T =)) , fox 0<a<2z 


(17.32) 

Now, for the purpose of a simple demonstration, let us consider the case where the 
channel is ideal, that is, c(t) = 6(t), and note that in this case Eq. (17.6) implies that 
A(Q) = Py(Q) Pr (2) is a Nyquist, for example, a raised-cosine, pulse. Figure 17.11 
presents a set of plots of the magnitude response |H(e/®, t)| as t varies in the range 
of 0 to T/2. When t = 0, |H(e/®, t)| = +, for 0 < œ < 2x. For t= 4, |H(e/®, t)| 
experiences a null at w = x. For other values of t in the range of 0 to T/2, |H(e/®, T)| 
lies between these two extreme cases. 

Figure 17.12 presents a block diagram of the discrete time baseband equivalent of the 
channel pertinent to our discussion in this section. The input to the channel is a random 
process consisting the transmitted data symbols s(n). Assuming that the data symbols 
have a zero mean, are independent of each other, and have a variance of unity, one will 


|H(e%”,7)| 


Figure 17.12 Channel model for generating the signal x(n) that feeds a symbol-spaced equalizer. 
H(e/®, t) is given by Eq. (17.32). 
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find the power spectral density of the channel output as 
Palet, t) = [H (e, t)? (17.33) 


Also, recalling Eq. (2.60), 


p(t) = =f. ®,(e!®, t)da 


20 Jax 
1 a 
== | |H(e/”, t)? do (17.34) 
=R 


where p(t) = E[|x(n)|*] denotes the signal power at the equalizer input for a tim- 
ing phase t. Recall that this is the timing recovery cost function that was defined 
earlier. 

Next, considering Eq. (17.34), one may find that for the present case, where the chan- 
nel is ideal, p(t) finds its maximum value at t = 0 and it reduces as t varies from 0 
to T/2, reaching its minimum at t = 7/2. This can be easily understood, if one notices 
from Figure 17.11 that the area under |H(e/®, t)|? is at its maximum when t = 0 and 
reduces as t varies from 0 to T/2. Moreover, it may be noted that when t = 0, h(n, Tt) 
reduces to an impulse in the discrete time, n. Hence, the input to the equalizer is free 
of ISI, and there is no need for equalization. When t Æ 0, the deep in the magnitude 
of |H(e/“, t)| has to be compensated for by an equalizer with a gain larger than one 
around œw = x. This, clearly, results in some noise enhancement, and thus, results in 
a poorer performance of the receiver. We may hence conclude that when the chan- 
nel is ideal, the optimum timing phase is the one that avoids any attenuation in the 
aliased response H(e/®, t) of the channel. One may imagine that this conclusion is true 
even in the cases where the channel is nonideal, viz., a choice of timing phase that 
results in a significant attenuation of H(e/®, t), over the aliased portion of the spectrum, 
may lead to a significant noise enhancement, thus a poor performance of the receiver. 
However, unfortunately, as the numerical results presented in Section 17.4.2 show, the 
maximization of the cost function p(t) does not necessarily avoid nulls in the aliased 
spectra and hence cannot always avoid possible poor performance of the symbol-spaced 
equalizers. 


17.3.3 Improving the Cost Function 


Figure 17.13 presents plots of the cost function p(t) for an ideal channel where H(Q) = 
P(Q) and P(Q) is the Fourier transform of a raised-cosine pulse shape. The plots are 
given for three values of the roll-off factor œ = 0.25, 0.5, and 1. An important point 
to note here is that the variation of p(t) with t reduces as œ decreases. Also, in an 
adaptive setting, a stochastic gradient (similar to the one in LMS algorithm) is used to 
search for the timing phase that maximizes p(t). In addition, we note that the variance 
of a stochastic gradient is approximately proportional to the magnitude of the underlying 
signal power. Relating these points, one may argue that the stochastic gradients used for 
timing recovery become less reliable as œ decreases. 
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Figure 17.13 Plots of the timing recovery cost function p(t) for an ideal channel and a raised- 
cosine pulse shape with three values of roll-off factor a. 


To overcome the above problem, we proceed with an intuitive reasoning of why the 
gradient of p(t) reduces with w and from there suggest a method of modifying the cost 
function p(t) such that it will be less dependent on œ. Referring to Figure 17.11, one 
finds that the variation of the received signal power as a function of timing phase, T, 
is a direct consequence of augmentation or cancelation of the aliased signal components 
as t varies. Moreover, if we note that the amount of aliased components reduces with 
a, it becomes obvious that the variation of o(t) with t reduces with œ. Extending this 
argument, we may suggest, to obtain a cost function that will be less dependent on a, one 
should only concentrate on the signal power over the band of the aliased components. 
This can be achieved easily by passing the received signal through a bandpass filter that is 
centered around 1/2T and choosing the output power of this filter as the timing recovery 
cost function. 

Figure 17.14 presents a set of plots of a modified cost function that is obtained by 
taking the T-spaced samples of the received signal and passing them through a single 
pole high-pass filter with the transfer function 


er: 


ae (17.35) 


where 0 < 6 < 1 determines the bandwidth of the filter and the factor /1 — 8? is to 
normalize the power gain of the filter, for a white input, to unity. We refer to this cost 
function as pg(T). We also note that o(t) can be thought as a special case of Pg (T), 
which is obtained by choosing 6 = 0. For the results presented in Figure 17.14, 2 is set 
equal to 0.95. As expected, unlike p(t), which varies significantly with a, p,(t), for B 
close to 1, remains nearly the same for all values of œ. 
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Figure 17.14 Plots of the improved timing recovery cost function for an ideal channel and a 
raised-cosine pulse shape with three values of the roll-off factor œ. The parameter 6 is set equal 
to 0.95. 


17.3.4 Algorithms 


Application of the cost function p(t) (or, equivalently, g(t )) leads to a variety of timing 
recovery/tracking algorithms. Here, we present two timing recovery algorithms that are 
built based on this cost function. 


17.3.5 Early-Late Gate Timing Recovery 


Early-late gate is one of the most common methods of timing recovery and can be applied 
to a variety of cost functions. To develop an early-late gate timing recovery algorithm 
based on the cost function p(t), we proceed as follows. 

We set the goal of timing recovery to choose a timing phase T = Top, which maximizes 
p(t). Figure 17.15 presents the pertinent components relevant to the early-late gate timing 
recovery. We note that when T = Top and ôt is a timing phase deviation, p(t + ôt) — 
p(t — ôt) = 0. On the other hand, for a nonoptimum timing phase t and a small ôt, we 
note that p(t + dt) — p(t — ôT) > O, when Tt < Topt and p(t + ôT) — p(t — ôT) < 0, 
when T > Top- The former case is demonstrated in Figure 17.15. 

In the light of the above observation, one may propose the following update equation 
for adaptive adjustment of the timing phase: 


t(n + 1) = tn) + ulol (n) + 67) — p(t(n) — ôT)) (17.36) 


where u is a step-size parameter. Moreover, we note that, in practice, the cost function 
p(T) is not available and only could be estimated based on the observed signal samples, 
say, by taking the average of the squares of a few recent samples of x(t). Or, we may 
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Figure 17.15 A demonstration of the early-late timing recovery Information. 


follow the philosophy of the LMS algorithm and simply use |x(t)|? at t = t + nT as an 
estimate of o(t). Applying such coarse estimates, we obtain the update equation 


t(n+ 1) = t(n) + w(|x(t(n) + bt + nT)|? — |x(t(n) — êt + nT) |). (17.37) 


The equations used for the realization of the early-late gate timing recovery with the 
modified cost function are summarized as follows: 


x(n) = y1 — B?x(t(n) + bt +nT) — Bx, (n — 1) (17.38) 
x(n) = V1 — B2x(t(n) — êt + nT) — x(n — 1) (17.39) 
t(n + 1) = t(n) + (lx)? — [x_1(@)]’). (17.40) 


Note that x; (n) and x_,(”) are, respectively, sequences obtained by passing the signal 
samples x(t(n) + ôt + nT) and x(t(n) — ôt + nT) through the high-pass filter B(z). 

To explore the performance of the timing recovery recursion (17.37) (and Eq. (17.38)), 
we use the MATLAB script that is listed below. This is also available on the accom- 
panying website under the name “TxRxQAM.m.” This script that will be extended and 
used throughout the rest of this chapter for a number experiments contains the following 
features/components: 


e As the continuous time cannot be simulated on a computer, a dense set of samples of 
a signal is used as a representation of its continuous time version. Here, L sample per 
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each symbol interval is considered as dense. In the below script, we have set L equal 
to 100. 

e The data symbols s(n) are randomly selected from a four-point QAM constellation, 
with the alphabet {+1 + j, —1 + j, —1 — j, +1 — j}. 

e Samples of the square-root raised-cosine pulse shape Eq. (17.7) are taken as the 
impulse response of the transmit and receive filters, pr(t), and pp(t), respectively. 
This is generated by calling the function “sr_cos_p.m,” which is available on the 
accompanying website. 

e The sequence of the data symbols s(n) are up-sampled by a factor of L, before being 
passed through pr(t), by adding L — 1 zeros after each symbol. This task is taken care 
of by the function “expander .m,” also available on the accompanying website. 

e The modulator, channel, and demodulator are implemented according to the sequence 
of the blocks in Figure 17.1. 

e The samples of the equivalent baseband impulse response of the transmission system 
from the input s(t) to the output x(t) can be obtained by letting s(t) = ô(t) (equiva- 
lently, letting s(n) = 1, for n = 0, and s(n) = 0, otherwise) and decimating the samples 
of x(t) to the desired rate. 

e The last segment of the MATLAB script (an attachment to “TxRxQAM.m,” here) takes 
the continuous time signal x(t) (having its samples at the dense grid of time points) and 
apply the early-late gate timing recovery algorithm to it. On the accompanying website, 
the MATLAB script “TxRxQAM.m’” with this attachment is named “TxRxQAMELG. m.” 


% This MATLAB script provides an skeleton for simulation of a quadratue 
% amplitude modulated (QAM) transmission system. 


T=0.0001; % Symbol/baud period 
L=100; % Number of samples per symbol period 
Ts=T/L; % A dense sampling interval (approx to continuous time) 
fc=100000; % Carrier frequency at the transmitter 
Dfc=0; % Carrier frequency offset 

% (carrier at transmitter minus carrier at receiver) 
phic=0; % Carrier phase offset 
alpha=0.5; % Roll-off factor for the square-root raised cosine filter 
sigma_v=0; % Standard deviation of channel noise 
ézi; % Channel impulse response 


3%% 4QAM symbols %%% 


N=1000; s=sign(randn(N,1))+1i*sign(randn(N,1)); 

%%% Transmitter filterring %%% 

pT=sr_cos_p(6*L,L,alpha) ; % Transmit filter 
xbbT=conv(expander(s,L),pT); % Band-limited baseband transmit signal 
%%% MODULATION %%% 

t=[0:length(xbbT) -1]’*Ts; % dense set of time points 

xT=2* real (exp (i*2*pi*fc*t) .*xbbT) ; 


%%% CHANNEL (including aditive noise) %%% 

xR=conv(c,xT); xR=xR+sigma_v*randn(size(xR)); % Received signal 
%%% DEMODULATION %%% 

t=[0:length(xR)-1]’*Ts; % dense set of time points 


xbbR=exp (-1i* (2*pi* (fc+Dfc) *t-phic) ) .*xR; 
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%%% Receiver filtering 
conv (xbbR, pR) 
BS 


7 x= 
PAAA AAAA AAAA AAAA 


; 
2999999 o 
CEEEETEEEEECEEEEESTEEEEES 


beta=0; % Should be changed for the modified algorithm 

mu0=0.005; % Step-size parameter 

dtau=12; % (A tau) times L 

mu=mu0*(L/4)/dtau; % Adjusted step-size 

kk=1;xp=0;xm=0; 

start=5*L+1; % To drop the transient of x(t) at the beginning 

tau=0.3*ones(1,floor((length(x)-start)/L)); % Initialize the timing offset 
% The timing offset tau is adjusted as the algorithm proceeds. 

for k=start:L:length(tau) *L 

tauT=round(tau(kk) *L) ; 

xp=sqrt (1-beta%*2) *x(k+tauT+dtau) -beta*xp; 

xm=sqrt (1-beta%2) *x(k+tauT-dtau) -beta*xm; 

tau (kk+1) =tau (kk) +mu* (abs (xp) *2-abs (xm) *2) ; 


222299999 22999 2299 
CSECCESCEEESEESEEESESEEEESEEESS 


kk=kk+1; 
end 
figure, axes(’position’, [0.1 0.25 0.7 0.5]), plot(tau(1:kk),’k’) 
xlabel(’Iteration Number, n’), ylabel(’\tau(n) ’) 


Figure 17.16 presents a set of plots of variation of t as the early-late gate timing 
recovery operates according to the above MATLAB script. The results are for choices of 
ôt = T/L, 10T/L, 20T/L, and 25T/L (respectively, dtau = 1, 10, 20, and 25, in the 
MATLAB script). From these results, and further experiments that may be performed, 


= dt = T/L 

= — -6t=10T/L 
—-—6t=20T/L H 
eee OTE 25E 


0.45F 


o 
P 
T 
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Figure 17.16 Plots of the timing phase update of the early-late gate timing recovery algorithm. 
The parameters used are 6B = 0, uo = 0.01, and four choices of ôt. Each plot is an ensemble 
average of 100 independent runs. 
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Figure 17.17 Comparison of the original and the improved early-late gate timing recov- 
ery algorithm for a single run. The parameters for the original algorithm are B =0, by = 
0.01, and ôt = T/L. The parameters for the improved algorithm are 6 = 0.9, mo = 0.0025, 
and 6t = T/L. 


one should observe that the convergence behavior of the early-late gate timing recovery 
algorithm remains independent of the parameter ôt over a relatively wide range of values 
within the interval T/L (a relatively small value) to 10T/L (= T/10). However, as ôt 
increases and approaches 25T/L (= T/4), some degradation in the convergence rate is 
observed. This observation can be explained, if we note that the ratio (o(t(n) + ôt) — 
p(t(n) — dt))/dt reduces as ôt increases (Figure 17.15). 

Figure 17.17 presents a pair of the learning curves of t(n) for the early-late gate timing 
recovery with 6 = 0 and its modified version with 6 = 0.9. The rest of the parameters 
have been chosen so that the cases have similar convergence rate. As seen, the choice 
B = 0.9 leads to an algorithm with significantly less jitters after it has converged. This 
result, of course, is in line with our discussion in Section 17.3.3. 


17.3.6 Gradient-Based Algorithm 


Using the gradient algorithm, the following recursion may be used to find the timing 
phase t that maximizes the timing recovery cost function p(t), 


dp(T) 
OT 


tn+1)=t(n)+pu (17.41) 


where u is a step-size parameter. 
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The early-late gate timing recovery recursion (17.36) (and, thus, Eq. (17.37)) may also 
be thought as a gradient-based algorithm, where the approximation 


p(T) _ |x(t(m) + bt + nT)|* — |x(t(n) — ôt + nT)? 
at 25t 


is used in Eq. (17.41) and the ratio j4/(26T) is redefined as the step-size parameter. We 
also note that in the early-late gate timing recovery algorithm, each iteration requires three 
samples of x(t); the desired sample x(nT + t) and the lagged samples x(t(n) + dt + nT) 
and x(t(n) — ôt + nT). 

Here, we present a lower complexity timing recovery algorithm that operates based on 
only two samples x(t(n) + nT) and x(t(n) +nT + T/2) for each update of t(n). We 
begin with using Eq. (17.28) to obtain 


dp(T) 4r . [2m 
= = ——F lei! sin (Fr +201) (17.43) 


(17.42) 


Also, it has been shown in Farhang-Boroujeny (1994b) that when £ is close to, but smaller 
than, 1, 


E[R{xo(t (n) + nT) xf (t(n) + nT + T/2)}] = —k sin ($ + 2m) (17.44) 


where xọ(t(n)+nT) and xı (t(n) +nT + T/2), respectively, are the signal sequences 
obtained by passing x(t (n) + nT) and x(t (n) + nT + T/2) through the transfer function 
B(z) of Eq. (17.35), and k is a positive constant. Following the same approach as the one 
used in the LMS algorithm and, also, in Eq. (17.37), we use R{xo(t (n) + nT)x¥ (t(n) + 
nT + T/2)} as a stochastic estimate proportional to the gradient do(t)/dt in Eq. (17.41). 
This leads to the update equation 


t(n +1) = t(n) + Rixt (n) + nT) x} (tn) + nT + T/2)} (17.45) 


The above algorithm can be implemented according to the following MATLAB script: 


LECEEEEEETEEEEEEEEEEEEEEEEEEEEEEETEEEES 

%%% Gradient-based timing recovery %%% 
BEEEEETETEEEEETEETEEEEEEEEEETEEEEEEEES 

beta=0.9; % Should be changed for the modified algorithm 
mu=0.0025; % Step-size parameter 

kk=1;xp=0;xm=0; 

start=5*L+1; % To drop the transient of x(t) at the beginning 
tau=0.3*ones(1,floor((length(x)-start)/L)); % Initialize the timing offset 


( 
% The timing offset tau is adjusted as the algorithm proceeds. 
for k=start:L:length(tau) *L 

tauT=round(tau(kk) *L) ; 

xp=sqrt (1-beta%2) *x(k+tauT) -beta*xp; 

xm=sqrt (1-beta%2) *x(k+tauT+L/2)-beta*xm; 

tau (kk+1)=tau (kk) +mu*real (xp*xm’) ; 

kk=kk+1; 
end 


On the accompanying website, the MATLAB script “TxRxQAM.m” with this attachment 
is called “TXRXQAMGB . m.” 
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Figure 17.18 A plot of the timing phase update of the gradient-based timing recovery algorithm. 
The parameters used are 6 = 0.9 and u = 0.0025. 


Figure 17.18 presents a typical plot of t(n) when the gradient-based algorithm is used. 
The simulation setup is similar to the one used to generate Figures 17.16 and 17.17 and 
the parameters used here are 6 = 0.9 and u = 0.0025. From this result, we may observe 
that the gradient-based timing recovery algorithm is somewhat slower than the early-late 
gate algorithm proposed earlier. This clearly is the price paid for a less complex algorithm. 


17.4 Equalizers Design and Performance Analysis 


When one has access to the statistics of the transmitted data symbols, the channel response, 
and the statistical characteristics of the channel noise, it is possible to evaluate and 
study the equalizer performance based on the Wiener filters theory that was developed 
in Chapter 3. In this section, we take this approach to develop some insight into the 
performance of the symbol-spaced, fractionally spaced, and DF equalizers. Adaptation 
algorithms for equalizer design as well as design strategies that jointly perform the tasks 
of carrier frequency and timing phase recovery are presented in the following sections. 


17.4.1 Wiener—Hopf Equation for Symbol-Spaced Equalizers 


To develop the Wiener—Hopf equations for symbol-spaced equalizers, we begin with the 
system model of Figure 17.19. This follows the block diagram shown in Figure 17.6. In 
addition, here, we have shown the details of how the noise sequence v(n) is generated. 
The sequence v(n) is obtained by passing the channel noise v,(m) through the receiver 
front-end filter, p(n). The latter is usually a square-root raised-cosine filter matched to 
its counterpart at the transmitter and is usually run at a rate that is a multiple of the 
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Figure 17.19 System setup for a symbol-spaced equalizer. 


symbol rate. In Figure 17.19, we have assumed that pp(n) is run at twice the symbol 
rate. We model v,(n) as a white sequence with variance oz. The transmit data stream 
s(n) is also modeled as a white sequence with variance oè. 

The equalizer tap weights w, should be selected to minimize output error e(n). The 
error e(n) originates from two independent sources: the data symbols s(n), and the channel 
noise v.(n). We may thus write e(n) = e® (n) + e)(n), where e® (n) and e? (n) are 
the errors that originate from s(n) and v,(m), respectively. The power spectral density 
P.e (z) of e(n), thus, can be written as 


®,,(z) = DO (z) + &&) (z) (17.46) 


where oo (z) and pve) (z) are the power spectral densities of e® (n) and e™® (n), respec- 
tively. oy (z) is obtained straightforwardly, if we note that the transfer function between 
s(n) and e(n) is H(z)W(z) — z~“ and apply Eq. (2.80). This leads to 


PO) = Pp HWE) — 2 4 (17.47) 


To obtain pe) (z), we note that the path between v, (n) and e™) (n) can be redrawn as in 
Figure 17.20. This is obtained by making use of the theory of multirate signal processing 
(Vaidyanathan, 1993) and noting that the cascade of Pp(z) and the twofold decimator 
can be replaced by the corresponding polyphase structure, as presented in Figure 17.20. 
Here, PY(z) and Px(z) are the polyphase components of Pg(z). They are obtained by 
separating the odd and even ordered powers of z in Pp(z) and expanding it as 


PRC) = PRR’) +z PEC’). (17.48) 
Also, the sequences v? (n) and vl (n) are the polyphase components of v,(n), defined as 


{v9(n)} = {..., v,(—2), v,(0), v, (2), ..-} (17.49) 
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Figure 17.20 The path between v,(m) and the output error e(n). 


and 
{vl (n)} = {..., ¥(—1), (1), v, (3), ..-}. (17.50) 


Next, we note that as v.(m) is a white process, v9(n) and vl (n) are two uncorre- 
lated sequences with the same power spectral density to that of v, (n). Hence, applying 
Eq. (2.80), one obtains 


BY (z) =O, (Z)(IPR(Z)W (2)? + IPEWE) (17.51) 


Combining Eqs. (17.47) and (17.51), and noting that ®,.(z) = oe and Pp Z) = a 
(as s(n) and v,(n) are white processes), we get 


®,,(z) = 0, |H(z)W(z) —z 4? +0, PROW + |Pa(z)W(2)I?) 7.52) 
On the other hand, one may note that 


E = Elje(n)|"] 


1 d 
= r f Poe (z) = 
2r j Z 


1 <p dz 1 dz 
= 02 f Howo z4P—+o02 f iRowol” 
TJ Z 2r j Z 
2 1 1 2 dz 
+o? — Q |Pi@Wor— (17.53) 
2r j z 


Following the principle of Wiener filtering, to design W (z), we set the goal of minimiz- 
ing the mean-squared error €. To this end, we first convert Eq. (17.53) to its time-domain 
equivalent. For this purpose, we define the error sequences e,(n), e? (n), and el (n) whose 
z transforms are, respectively, 


E,(z) = H(@)W(z) -77° (17.54) 
E? (z) = PR)W (2) (17.55) 


and 
EL (2) = Pa(z)W(z) (17.56) 
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and note that 


e (n) = h(n) x w, — d(n) (17.57) 
e, (n) = pr(n) * Wy (17.58) 

and 
el (n) = p(n) « w, (17.59) 


where * denotes linear convolution and d(n) is the inverse z transform of z~“. We note 
that unlike e® (n) and e”? (n), which have indefinite length and are power signals, the 
sequences e, (n), el, (n), and el (n), defined here, may have finite duration and are energy 
signals. These, respectively, are the impulse responses between the input sequences s(n), 
vo (n), and vl (n) and the output error sequence e(n). Moreover, using the Parseval’s 
relation, the time-domain equivalent of Eq. (17.53) is obtained as 


E= 0? D le(n)? +o? 5 le) M +o? 5 jel m)? (17.60) 
n n n 
Next, we define the column vectors 
e (0) e? (0) el (0) 
e,(1) o eae 5 bee 


e = e,(2) » & = e? (2) n fy = ei (2) 


wë d(0) 
w* d0) 
w= ws and d= |40) (17.61) 


Note that, here, we have followed the convention set in Chapter 3 and have defined the 
elements of the tap-weight vector w as the conjugates of the tap weights of the equalizer. 
Also, whereas the index for the tap weights appears as a subscript (e.g., w;), for the time 
sequences such as e(n) the time index n is enclosed in parentheses. 

Using the definitions (17.61), one finds that Eqs. (17.57), (17.58), and (17.59) can be 
expanded, respectively, as 


e, = Hw* —d (17.62) 
(on 
e = Pw" (17.63) 
Os; 
and 
el = Ceply* (17.64) 
Ve x 


Os 
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where 


hO 0 0 
hl) AO) 0 -- 
H= | n(2) AQ) AO) +- (17.65) 


pro) 0 0 


0 0 
P= pt2) pC) pO) --- (17.66) 


and i 
PRO) 0 0 


PRC) pr) 0 


1 a 
P= | ph) ph(1) ph) e (17.67) 
Also, if we define 
es af 
e= |e) and Q=| 2P° 
el Zepi 


Eqs. (17.62), (17.63), and (17.64) can be combined as 
e = Qw* -d (17.68) 


where, here, d is the vector d defined before with a number of zeros appended at the end 
of it to have the same length as Qw*. Using Eq. (17.68), Eq. (17.60) reduces to 


E = ogee (17.69) 


where the superscript H denotes Hermitian. 
Substituting Eq. (17.68) in Eq. (17.69) and expanding, we obtain 


E = of (W™(Q"Q)w* — wi Qld) — (Q'a) w* + d"a) (17.70) 
As & is real-valued, applying a conjugate sign to both sides of Eq. (17.70) leads to 
E = of (w"(Q™Q*)w — wi(Q'd*) — (Q'a*)!w + d"d). (17.71) 


This is similar to the cost function expressions derived in Chapter 3 and elsewhere in 
this book. It is thus straightforward to show that the optimum tap weight vector of the 
equalizer (i.e., the minimizer of €) is given by 


w, =R'p (17.72) 


where R = Q'Q* and p = Q'd*. Also, the minimum mean-squared error, that is, the 
value of € when w = w,, is obtained as 


Emin = 0, (1 — wip) (17.73) 
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ideal path 


Figure 17.21 System setup for a half symbol-spaced equalizer. 


Fractionally Spaced Equalizer 


For brevity and clarity of presentation, here, we limit our discussion to a particular case of 
fractionally spaced equalizer where the equalizer taps are at one-half of symbol spacing, 
that is, when L = 2 and K = 1. Figure 17.21 presents a system setup that may be used 
for the study of a fractionally spaced equalizer with symbol spacing of half a symbol 
interval. 

Following Figure 17.21 and a similar vector formulation as in the case of symbol-spaced 
equalizer, here, we obtain 


e =Hw-d (17.74) 
and 
ek = z Pw (17.75) 
Ts 
with 
h(0) 0 0 0 0 
h(2) hd) hO 0 O --: 
H= | (4) h(3) h(2) h(1) AO) --- (17.76) 
and 


PRO) 0 0 
PRC) Pr(O) 0 
P= | PR PR) PRO) --: (17.77) 


Moreover, if we define 
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Eqs. (17.74) and (17.75) can be combined to obtain an equation similar to 
Eq. (17.68), which will then lead to similar results to those in Eqs. (17.69), (17.71), 
(17.72), and (17.73). 


Decision Feedback Equalizer 


Figure 17.22 presents a system setup that may be used for the study of a DF equalizer. For 
simplicity of presentation, we have chosen to limit our discussion here to the case where 
the feedforward filter is a symbol-spaced one. The extension of the results to fractionally 
spaced equalizers is not difficult and is left as an exercise for the interested readers. 

Following Figure 17.22, one will find that, here, Eqs. (17.63) and (17.64) should be 
modified as 


o 
e) = — Pwi (17.78) 
Os; 
and 
el = 2% ply: (17.79) 
ve T FF ; 
Os; 
respectively, and Eq. (17.57) as 
e,(n) = h(n) x Wp, — O(n — A — 1) & Weg, — d(n). (17.80) 


Moreover, the vector formulation of Eq. (17.80) is obtained as 


(17.81) 


Figure 17.22 System setup for a decision feedback equalizer. 
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where H and d have the same definitions as before, Wpp and Wppg are the tap-weight 
vectors of the feedforward and feedback filters, respectively, and 


Oe 0 
0--. 0 

a (17.82) 
0- 1 


The top part of G is a zero matrix with M columns (where M, as defined earlier, is the 
length of the feedback filter, wpg) and A + 1 rows, and the bottom part of G is an identity 
matrix of size M x M. 

Next, if we define 


é H -G 
Ore po 
e= | e, and Q=/7;P 0 
i Ove pl 
e,,. sP 0 


Eqs. (17.78), (17.79), and (17.81) can be combined to obtain an equation similar to Eq. 
(17.68), which will then lead to similar results to those in Eqs. (17.69), (17.71), (17.72), 
and (17.73). 


17.4.2 Numerical Examples 


The MATLAB script “equalizer_eval.m,” below, also available on the accom- 
panying website, puts together the above results and evaluates the performance of a 
symbol-spaced equalizer, a half symbol-spaced equalizer, and a DF equalizer with a 
symbol-spaced feedforward filter. The baseband equivalent channel h(t) is evaluated 
using Eq. (17.13). For both cases, the timing phases of the samples to equalizer are 
varied over one symbol interval. As explained earlier, the fractionally spaced equalizer 
should be insensitive to the timing phase. The symbol-spaced equalizer, on the other 
hand, is expected to show some sensitivity to the timing phase. The sensitivity of the DF 
equalizer to timing phase is examined and commented on through numerical examples. 

We note that the performance of an equalizer depends on the delay A. However, when 
the equalizer length is sufficiently long, the following choice of A provides a near optimal 
performance: 


1 
A= z length of channel + length of equalizer) (17.83) 


whereas here, the lengths are in the unit of symbol interval. Hence, whereas a symbol- 
space equalizer with N tap weights has a length of N symbol intervals, a half symbol-space 
equalizer with N tap weights has a length of N/2 symbol intervals. The number of 
feedback taps in the DF filter Wpg (z), M, is chosen to remove all the postcurser samples 
of the channel response at the output of the feedforward filter, Wpp(z). The reader is 
encouraged to read carefully through the MATLAB script “equalizer_eval.m” to 
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see how the parameters A and M are calculated. The reader is also encouraged to explore 
the script to understand how the various equations have been programmed. 


oe 


oe 


This script evaluates the minimum mean-squared error (MMSE) of the 
symbol-spaced and half symbol-spaced equalizers. The equivalent baseband 
channel h(t) is obtained according to (17.13). The MMSE values are then 
calculated, following the formulations in Section 17.4. 


oe dP 


ae æ 


Q 


lear all,close all 

% parameters %% 

T=0.0001; L=100; Ts=T/L; fs=1/Ts; fc=100000; 

alpha=0.5; sigmas=1; sigmanuc=0.01; 

c=[0.5 zeros(1,60) 1 zeros(1,137) 0.3]; % More common channel 
c=[1 zeros(1,67) 0.75 zeros(1,145) 0.4]; % Less common channel 
c=c/sqrt(c*c"); 

pT=sr_cos_p(16*L,L,alpha) ’; 

pR=pT; 

%% Construction of the equivalent baseband channel %% 

p=conv (pT, pR) ; 

c=c.*exp(-j*2*pi*[0:length(c)-1]*Ts*fc) ; 

h=conv(c,p); 

pR=sqrt (L/2) *pR(1:L/2:end) ; 


o 


BESEEEEEEEEEEEEEEEESEES 
% T spaced equalizer % 
SESEEEEETESEEETEEEEESES 
1 


H % equalizer length = N = 21 


h0=h(k:L:end); 
Delta=round((length(h(1:L:end))+N)/2); 


C=toeplitz([h0 zeros(1,N-1)],[h0(1) zeros(1,N-1)]); 
P0=toeplitz([pR(1:2:end) zeros(1,N-1)],[pR(1) zeros(1,N-1)]); 
Pl=toeplitz([pR(2:2:end) zeros(1,N-1)],[pR(2) zeros(1,N-1)]); 


Q=[C; (sigmanuc/sigmas)*P0; (sigmanuc/sigmas) *P1]; 
d=[zeros(Delta,1); 1;zeros(length(Q(:,1))-(Delta+1),1)]; 
Ryy=(Q’ *Q+1e-14*eye(N) ); 

pyd=(Q'*d); 

w=Ryy\pyd; 

mmse0 (k) =sigmas*2*real (1-w’ *pyd) ; 

spower (k)=sum(abs(h0) .*2); 


N=21; 
hf=h(1:L/2:end) ;Delta=round((length(hf)+N) /4); 
for k=1:L 
hf=h(k:L/2:end) ; 
C=toeplitz([hf zeros(1,N-1)], [hf£(1) zeros(1,N-1)]); C=C(1:2:end,:); 
P=toeplitz([pR zeros(1,N-1)],[pR(1) zeros(1,N-1)]); 
Q=[C; (sigmanuc/sigmas) *P]; 
d=[zeros(Delta,1); 1;zeros(length(Q(:,1))-(Delta+1),1)]; 


Ryy=(Q’ *Q+1le-6*eye (N) ) ; 
pyd=(Q'*d); 
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w=Ryy\pyd; 
mmsef1 (k) =sigmas*2*real (1-w’ *pyd) ; 


si 
CESSSESCEEESECSEEESEEESSESEES 
21; % FF filter length 
for k=1:L 

hO=h(k:L:end) ; 

Delta=round((length(h(1:L:end) )+N)/2); 

C=toeplitz([h0O zeros(1,N-1)],[h0(1) zeros(1,N-1)]); 

M=length(h0)+N-1-Delta-1; % FB filter length 

PO=toeplitz([pR(1:2:end) zeros(1,N-1)],[pR(1) zeros(1,N-1)]); 
Pl=toeplitz([pR(2:2:end) zeros(1,N-1)],[pR(2) zeros(1,N-1)]); 

Q=[[C [zeros(Delta+1,M); eye(M)]]; 

[(sigmanuc/sigmas)*PO zeros(length(P0(:,1)),M)]; 
[(sigmanuc/sigmas)*P1 zeros(length(P1(:,1)),M)]]; 

d=[zeros(Delta,1); 1;zeros(length(Q(: (1))-(Delta+1) ,1)1; 

Ryy=(Q’ *Q+1e-14* eye (N+M) ); 

pyd=(Q'*d) ; 

w=Ryy\pyd; 

mmsedf (k) =sigmas*2*real (1-w’ *pyd) ; 
end 

Figures 17.23 and 17.24 present the results of execution of “equalizer_eval.m” 
for two examples of the channel c(t), as listed in the above script. The first example, 
labeled as “more common channel,” is a case where the timing phase that maximizes 
p(t) leads to an equivalent baseband channel for which the symbol-spaced equalizer 
performs well. Most of the channels in practice have this behavior. The second example, 
labeled as “less common channel,” is a case where the timing phase that maximizes p(T) 
leads to an equivalent baseband channel for which the symbol-spaced equalizer performs 
poorly. The results presented in Figures 17.23 and 17.24 include (i) Panel (a): the signal 
power at the input of a symbol-spaced (T-spaced) equalizer as a function of the timing 
phase t; (ii) Panel (b): MMSE (minimum mean-squared error) results of a 21-tap T- 
spaced equalizer, a 21-tap T/2-spaced equalizer, a 3l-tap T/2-spaced equalizer, and a 
DF equalizer with a 21-tap T-spaced feedforward filter; (iii) Panel (c): the magnitude 
response of the baseband equivalent channel seen by the T-spaced equalizer for the case 
when Tt is optimally selected to minimize MMSE, and the worse choice of t that results 
in the maximum MMSE. 
From the results presented in Figures 17.23 and 17.24 and many more that one can 

try (by changing the channel, c(t), and the other system parameters in “equalizer_ 
eval.m’), the following observations are made: 


e As one would expect, the performance of the symbol-spaced equalizer can degrade 
significantly for some choices of the timing phase. The fractionally spaced equalizer as 
well as the DF equalizer, on the other hand, are both almost insensitive to the timing 
phase. 

e The performance of the symbol-spaced equalizer usually degrades near the timing phase 
where the signal power is minimum. It also has a near optimum performance when the 
timing phase is chosen to maximize the signal power, that is, the cost function p(t). 
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Figure 17.23 The results of execution of ““equalizer_eval.m’” for the more common channel. 


However, this rule is not general. For some cases that one may encounter occasionally, 
it is found that the symbol-spaced equalizer performs very poorly when the timing 
phase is selected for the power maximization. 

e The fractionally spaced equalizer performs equal to or better than the symbol-spaced 
equalizer, when both are allowed to have the same time span. For the results presented in 
Figures 17.23 and 17.24, two cases of the fractionally spaced equalizer are considered. 
These have time spans that are 50% and 75% of that of the symbol-spaced equalizer, 
respectively. 

e As was predicted before, assuming no error propagation, the DF equalizer always 
performs better than its linear counterpart. Also, as observed, there is no significant dif- 
ference between a fractionally spaced equalizer and its DF counterpart. Hence, one may 
conclude that because of the potential problem of error propagation in DF equalizers, 
the choice of a fractionally spaced equalizer should always be preferred. 
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Figure 17.24 The results of execution of “equalizer_eval.m” for the less common channel. 


17.5 Adaptation Algorithms 


Obviously, any of the various adaptive algorithms that were developed in the previous 
chapters can be directly applied to channel equalizers. However, in choosing an adaptive 
algorithm, one should also take note of the following points: 


e The power spectral density of the input to an equalizer may vastly vary, depending on 
the channel impulse response; see the examples presented in Section 6.4.2. Hence, the 
performance of the LMS algorithm in channel equalization is highly dependent on the 
channel and may vary significantly from channel to channel. 

e The RLS algorithm, on the other hand, performs independently of the channel. However, 
besides its much higher computational complexity, it may suffer from the numerical 
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stability problem, especially for channels that are poorly conditioned, that is, the cases 
where the eigenvalue spread of the underlying correlation matrix is large. 

e The intermediate algorithms, such as the affine projection algorithm (of Section 6.7) 
and the LMS-Newton algorithm (of Section 11.15) may thus be good compromised 
choices for adaptation of channel equalizers. 

e For the fractionally spaced equalizers, one should take note of the following point. The 
input signal to the equalizer is oversampled above its Nyquist rate. This means that 
the power spectral density of the input signal to the equalizer over higher portion of 
its frequency band reaches zero. This, in turn, translates to the fact that the underlying 
correlation matrix is poorly conditioned. Hence, the problems mentioned in the first two 
items above may be more pertinent in the case of the fractionally spaced equalizers. 


17.6 Cyclic Equalization 


Training symbols that are known to the receiver are often used at the beginning of 
a communication session to synchronize the receiver carrier and symbol clock to the 
incoming signal and to adjust the equalizer coefficients to a point near their optimal 
values. It turns out that if the training sequence is selected to be periodic and have a 
period equal to the length of the equalizer, the equalizer tap weights can be obtained 
almost instantly. Moreover, when such training sequences are used, simple mechanism 
can be developed for fast acquisition of the carrier frequency and symbol clock of the 
received signal. Furthermore, any phase offset in the carrier will be taken care of by 
the equalizer. In addition, the use of the periodic training sequences allows adoption of 
a simple mechanism for selection of the time delay A and, thus, alignment of the data 
symbol sequences between transmitter and receiver. 


17.6.1 Symbol-Spaced Cyclic Equalizer 


Let the periodic sequence ..., s(N — 1), s(0), s(1), (2), ..., s(N — 1), s(0), ... be trans- 
mitted through a channel with the symbol-spaced complex baseband equivalent response 
h(n). Ignoring the channel noise, this periodic input to the channel results in a periodic 
output x(n) with the same period. Now consider an equalizer setup with the input x(n) 
and desired output s(n). As, here, s(n) and x(n) are periodic, one may pick a period of 
samples of x(n), with an arbitrary starting point, and put them in a tapped-delay line/shift 
register with its output connected back to its input. A similar shift register is also used to 
keep one cycle of s(n). An equalizer whose tap weights are adjusted to match its output 
y(n) with s(n) is then constructed as shown in Figure 17.25. 

The adaptation algorithm in Figure 17.25 can be any of the known adaptive algorithms, 
including the LMS, NLMS, APLMS or RLS. Also, once a cycle of x(n) is received, the 
adaptation process can begin and run for sufficient number of iterations for the equalizer 
to converge. To put this in a mathematical framework, we define the column vectors 


xo = [x(n) x(n — 1) -x(n — N + 1)]7 


and 
= H 
w = [wọ w -wy 1] 
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Figure 17.25 The adaptation setup for a symbol-spaced cyclic equalizer. 


where “T” and “H” denote transpose and Hermitian (i.e., conjugate transpose), respec- 
tively. Also, if we define x; as the circularly shifted version of xọ after i shifts and use 
the LMS algorithm for tap-weight adaptation, the repetition of the following loop will 
converge to a tap-weight vector that closely approximates the optimum vector w,. Here, 
the tap-weight vector w is initialized to an arbitrary value w(0). In practice, the common 
choice for w(0) is the zero vector. 


fori =0,1,2,... 
e(i) = s(i mod N) — w(i)x; 
w(i + 1) = w(i) + 2pe*(i)x; 
end 


In the above loop, “i mod N” reads i modulo N, which means the integer remainder of 
i divided by N. It thus generates the ordered indices of the periodic sequence s(0), s(1), 
..., S(N — 1), 5(0), s(1), ... 
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Alternatively, one may set up the problem of finding the optimum choice of the tap- 
weight vector w as a least-squares problem with the cost function 


¢ = lle? = ee (17.84) 


where e = [e(0) e(1) e(2)---e(N — 1)]'. To proceed with a solution to this least-squares 
problem, we note that by conjugating both sides of the first line in the above “for loop,” 
we get 

s*(i) —xH#w =e*(i), i =0,1,2,...,N—1 (17.85) 


Combining these equations, we obtain 
s* — X4w = e* (17.86) 


where X is an N x N matrix whose columns are the equalizer tap-input vectors Xo, Xj, 
X>,..-, Xy_ . It is also interesting to note that, in the expanded form, 


x(n) x(n—N+1) x(n—N+2) --- x(n—1) 


x(n — 1) x(n) x(n—N+1) +--+ x(n—2) 
X= x(n — 2) x(n — 1) x(n) s+) x(n — 3) (17.87) 
x= NE x(n —N +2) eS NAS ove x) 


Now recall that the optimum value of w (in the cyclic equalizer) is the one that 
minimizes the Euclidean norm of the vector e. On the other hand, as the choice of 
w = (X#)~!s* results in e = 0, which has the trivial minimum Euclidean norm of zero, 
one may argue that the optimum value of w is obtained simply by solving the equation 


XHw = s*. (17.88) 


We also note that the solution to this problem is unique when X# (or, equivalently, X) is 
full rank, or, in other words, the inverse of X exists. 

When X" has a lower rank than its size, the Eq. (17.88) is underdetermined and thus 
does not have a unique solution. One method of dealing with this problem is to proceed 
as follows. By multiplying Eq. (17.88) from left by X, we obtain 


XXw = Xs*. (17.89) 


This is still an underdetermined equation. However, as the coefficient matrix (XX) is 
Hermitian, it is possible to modify Eq. (17.89) to a determined equation with a unique 
solution. This is done by replacing the coefficient matrix (XX") by (XX + eI), where 
€ is a small positive constant. Hence, the cyclic equalizer tap weights may be obtained 
by solving the equation 

XX" + €l)w = Xs”. (17.90) 


It is also important to note that even in the cases where X is full rank and, thus, 
Eq. (17.88) has a unique solution, the use of Eq. (17.90) is recommended. This is because 
the additive/channel noise in x(n) will introduce a bias on the equalizer tap weights that 
may be very destructive when the equalizer is applied to the rest of the received signal 
samples. The addition of «I may be thought as a regularization step that moderates such 
a bias. 
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A Low Complexity Method for Solving the Cyclic Equalizer Equation 


Because of the special form of the matrix X, there is a low-complexity method for solving 
Eq. (17.90). We note that X is a circular matrix, and recall from the theory of circular 
matrices presented in Chapter 8 that when X is a circular matrix, it can be expanded as 


X=F'XF (17.91) 


where F is the DFT transformation matrix and Æ is the diagonal matrix whose diagonal 
elements are obtained by taking the DFT of the first column of X. Substituting Eq. (17.91) 
in Eq. (17.90) and noting that FF! = I and 


XH = FINNS (17.92) 


we get 
F\(XX* 4+ DFWH=F 'NXFS* (17.93) 


Multiplying this equation from left by F and rearranging the result, we obtain 
w= F | (XX* +l) XFS (17.94) 


Following Eq. (17.94), the computation of the cyclic equalizer tap weights can be 
carried out by taking the following steps: 


1. Compute the DFTs of the vectors s* and Xp, that is, compute Fs* and Fxp. Point- 
wise multiply the elements of the two DFT results. This gives the vector ¥Fs* in 
Eq. (17.94). 

2. Point-wise divide the elements of the result of Step 1 by the elements of the vector 
|Fxo|? + €. The result will be the vector (¥ X* + DIX Fs* 

3. Taking the inverse DFT of the result of Step 2 gives the desired tap-weight vector w. 


To gain a better understanding of the behavior of the cyclic equalizer, we present 
some numerical examples. The MATLAB script “CyclicEqT_eval.m,” available on 
the accompanying website, allows one to evaluate the performance of the symbol-spaced 
cyclic equalizers. It evaluates the equalizer tap weights according to Eq. (17.90) and 
substitutes the result in Eq. (17.71) to obtain the MSE that will be achieved when an 
arbitrary data sequence is passed through the cascade of channel and equalizer. The 
MATLAB script “CyclicEqT_eval.m” also evaluates the MMSE according to Eq. 
(17.73), for comparison. The results of such comparisons for channels c, and c, (intro- 
duced in Section 17.4.2) and two choices of the timing phase for each are presented in 
Table 17.1. Here, we have set Oy = 0.01, æ = 0.5, the equalizer length N = 32, and the 
results are based on one randomly selected cycle of the received signal. 

The results presented in Table 17.1 reveal that the cyclic equalizer achieves some level 
of equalization. However, the resulting MSE, for some cases (particularly, when the timing 
phase is wrong and MMSE is relatively high) can be over an order of magnitude higher 
than the minimum achievable MSE. In the sequel, we discuss a number of fixes to this 
problem. 
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Table 17.1 Performance comparison of the symbol-spaced cyclic 
equalizer with the achievable MMSE (the optimum equalizer). 


Channel MMSE MSE of Cyclic Equalizer 


cy t= T/2 0.000105 0.000196 
cy, t=O 0.0290 0.3217 
Cy, T= T/2 0.000253 0.000654 
Cy, T = 0.9T 0.0299 0.6634 


Equalizer Design via Channel Estimation 


As noted earlier, direct computation of the equalizer tap weights, through the use of 
an adaptive algorithm (e.g., the LMS algorithm) or using Eq. (17.90) may result in an 
inaccurate design. An alternative method that results in better designs is to first identify 
the channel and also obtain an estimate of the variance of the channel noise and then 
use the design equations of Section 17.4.1 to obtain an estimate of the equalizer tap 
weights. 

The channel identification setup here will follow the same structure as the equalizer 
structure in Figure 17.25, with the sequences x(n) and s(n) switched. Figure 17.26 
presents a block diagram of such a channel estimator. The samples of impulse response 
of the equivalent baseband channel are the tap weights A(0) through A(N — 1), which 
should be found using an adaptive approach or through the solution of a system of 
equations, similar to the procedures suggested earlier for finding the equalizer tap 
weights, w,- 

There are two advantages to the approach of equalizer design via channel identification: 


1. In most of the practical channels, the duration of the impulse response h(n) is lim- 
ited and can be well approximated within the specified cyclic length N samples. The 
optimum equalizer whose role is to realize the inverse of the channel transfer func- 
tion, usually, has much longer length. Strictly speaking, the optimum equalizer has 
an infinite length; often called an unconstrained equalizer (Chapter 3). In practice, by 
limiting the length of the equalizer to N, one should apply a constraint to the opti- 
mization while designing the equalizer. The cyclic equalizer does not perform such 
optimization. As it only considers the signal samples over a grid of frequencies with N 
samples over the range of 0 < w < 2z, it simply obtains a set of equalizer tap weights, 
which, in the absence of channel noise, are time aliased samples of the unconstrained 
equalizer, following the sampling theorem. By adopting the method of equalizer design 
via channel identification, the constrained optimization is readily applied. 

2. When the LMS (or NLMS) algorithm is used to adjust the tap weights of the equalizer, 
the cyclic equalizer may experience some very slow modes of convergence and, thus, 
may need many thousands of iterations before it converges. The slow convergence 
happens in cases where there exists a deep fade over a portion of the equivalent 
baseband channel; a situation that is caused by the channel and thus is out of the 
designer’s control. In the case of the channel estimation approach of Figure 17.26, on 
the other hand, the designer has full control over the choice of the sequence s(n), whose 
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Figure 17.26 The system setup for a symbol-spaced cyclic channel identification. 


correlation properties determine the convergence behavior of the LMS algorithm; see 
further elaborations later. 


Selection of Pilot Sequence 


Next, we discuss the desirable properties of the pilot sequence s(n), and present a class 
of pilot sequences that hold such properties. To this end, we note that the dual of 
Eq. (17.88), here, is 

S"h = x* (17.95) 


where 
xX = [x(n) x(n4+ 1)---xn1+N— Ci" (17.96) 


h = [h(0) h(1)--- A(N — 1)]Ë (17.97) 
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and 
s(N — 1) s(0) s(1) --- s(N — 2) 
s(N —2) s(N—1) s(0) --- s(N — 3) 
S= |S- 3) s(N—2) s(N-1) ee s(N-—4) (17.98) 
s0) s1) s(2) . ot = 1) 


Multiplying Eq. (17.95) from the left by S and solving for h, we get 
h = (SS")"'(Sx*) (17.99) 


This solution would become trivial, if we had chosen the pilot sequence s(n) such that 
S was an orthogonal matrix, that is, if SS! = KI, where K is a constant and I is the 
identity matrix. In that case, Eq. (17.99) reduces to 


1 
h = —Sx*, (17.100) 
K 


which shows that the channel impulse response vector h can be obtained by premultiplying 
x* with the matrix zs: 

In addition to the orthogonality property of S, in practice, it is desirable to choose a set 
of s(n)s with the same amplitudes, so that the transmit power is uniformly spread across 
time. It turns out that such sequences exist. They are called polyphase codes (Chu, 1972). 
They exist for any length, N. A particular construction of polyphase codes that we use 
here follows the formula 


for N even 
eft +D/N for N odd 


eee 
N 
etn /N 


s(n) = (17.101) 


for n = 0, 1,..., N — 1. The MATLAB function “CycPilot.m” on the accompanying 
website can be used to generate pilot sequences based on the formula (17.101). 


Impact of the Channel Noise 


In the equations given earlier, for simplicity of derivations, the channel noise was ignored. 
If the channel noise is included, Eq. (17.95) becomes 


SHh + v* = x* (17.102) 


where v is the noise vector associated with x. Multiplying Eq. (17.102) from the left by 
zs, we get 


A 


1 1 
h = h + —Sv* = —Sx* 17.1 
+ Sy Fae ( 03) 


where h is a noisy estimate of h. 

One method of improving the estimate of h is to transmit multiple periods of pilot 
symbols and replace the vector x by its average obtained by averaging over multiple 
periods. 
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Estimation of the Variance of the Channel Noise 


To obtain an estimate of the variance of the channel noise, we take the following approach. 
As discussed earlier, many operations at the receiver can be greatly simplified by sending 
a few cycles of the periodic pilot sequence s(n). Assuming that we have been able to 
identify a portion of the received signal sequence x(n) that is associated with the periodic 
sequence s(n) and note that 


x(n) = h(n) x s(n) + v(n) (17.104) 
one finds that the first term on the right-hand side of Eq. (17.104) is periodic. Hence, 
z(n) = x(n) —x(n+N)=v(n)—vn+QN). (17.105) 


Now, if we assume that v(m) and v(n + N) are uncorrelated, a time average of |z(n)|*, 
obviously, gives an estimate of 207. 


Comparisons 


To give of an idea of the performance difference of a direct cyclic equalizer design and 
its indirect counterpart where an estimate of the channel is used to design the equalizer, 
we present some numerical results. The results are presented in Table 17.2. This is an 
extension of Table 17.1, by adding the last column. As was predicted earlier, the results 
clearly show a superior performance of the indirect method. In the indirect case, the two 
periods of x(n) that are used in Eq. (17.105) are averaged to obtain a less noisy version 
of x before application of Eq. (17.100). Further comparisons are left to the reader. 

It is also worth noting that whereas in the direct method the equalizer length must be 
equal to the length of the pilot symbols, when the indirect method is adopted one has the 
freedom of choosing any arbitrary length for the equalizer. 


17.6.2 Fractionally Spaced Cyclic Equalizer 


Fractionally spaced cyclic equalizer structures are obtained by simple modifications to 
their symbol-spaced counterparts. Figure 17.27 presents a system setup for a half symbol- 
spaced direct cyclic equalizer. Note that as here the rate of the samples of x(n) is twice 


Table 17.2 Performance comparison of the cyclic equalizers for direct 
and indirect setting. 


MSE of Cyclic Equalizer 


Channel MMSE Direct Indirect 
cyt =T/2 0.000105 0.000196 0.000164 
cyt =0 0.0290 0.3217 0.0292 
Cj; T = T/2 0.000253 0.000654 0.0003743 


Cz; T =0.9T 0.0299 0.6634 0.0351 


626 Adaptive Filters 


adaptation 
algorithm 


Figure 17.27 The adaptation setup for a half symbol-spaced cyclic equalizer. 


that of s(n), the delay between the pilot symbols s(n) is z~*. Also, because it is only after 
every two clock cycles that one new sample of s(n) will be available for comparison with 
the equalizer output, the adaptation algorithm should be iterated after every two clock 
cycles. Moreover, as the cyclic equalizer should have the same time span as pilot symbols, 
the equalizer length here is 2N. 

In the case of indirect cyclic equalizer, multiple samples of the channel impulse response 
should be estimated per symbol interval. This can be easily done by setting up a set of 
parallel channel estimators similar to the one in Figure 17.26. If the samples of the channel 
impulse response at a spacing T/L are required, L parallel channel estimators will be 
used. Each channel estimator operates on a set of symbol-spaced samples of x(n) that 
correspond to one of the L time phases. The estimated samples of the channel responses 
will be interleaved to construct the desired response. 

The general conclusions that were derived in the case of the symbol-spaced equalizer 
also apply here. In particular, direct computation of the equalizer tap weights using an 
equation such as Eq. (17.88) or Eq. (17.89) leads to results that are relatively far from 
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the optimum tap weights. On the other hand, an indirect method that uses an estimate of 
the channel response and noise variance to calculate the equalizer tap weights results in 
much improved results. 


17.6.3 Alignment of s(n) and x(n) 


So far, we have assumed that the receiver is able to identify the beginning of each cycle 
of the received signal that matches with the transmitted pilots. Actually, this assumption 
may never be true in practice. In a practical receiver, one can only obtain a coarse estimate 
of the boundaries of the beginning and end of the cyclic preamble. 

When s(n) and x(n) are not time-aligned, the estimated equalizer tap weights or 
the channel impulse response will be replaced by their time-shifted replicas. Also, as 
the signals are periodic with period of N samples, these shifts will be cyclic, that is, the 
samples rotate within a vector of length N (2N in the case of the half symbol-spaced 
equalizer). To remove the ambiguity caused by such rotation, the samples associated with 
the equalizer tap weights or the channel impulse response are rotated so that the larger tap 
weights/samples be positioned around the middle of the estimated vector. This method, 
in most cases, works pretty well, because both equalizer tap weights and the samples of 
the channel impulse response are sequences that grow from zero to a maximum (roughly 
at the middle) and decay to zero afterward. 


17.6.4 Carrier and Timing Phase Acquisition and Tracking 


Our discussion in this section so far was based on the assumption that the received cyclic 
signal was free of any frequency offset and in the case of symbol-spaced equalizer a proper 
timing phase was selected a priori. However, in practice, it is the task of the receiver to 
find a proper timing phase and to compensate for any residual carrier frequency offset in 
the demodulated received baseband signal. 


Carrier Acquisition and Tracking 


As in the case of the noise variance estimation, here also we assume that the cyclic 
preamble consists of a few cycles of the pilot sequence s(n). In that case, in the presence 
of a carrier frequency offset Aw,, the demodulated signal samples x(n) are replaced by 


xm) = e/4@"x(n) + v(n). (17.106) 


Note that we have ignored the rotation of the noise samples v(m) that may be caused by 
the carrier frequency offset, as such rotation does not affect the statistics of the result. 
Recalling that during the cyclic preamble (assuming that the samples are taken at symbol 
rate, 1/T) x(n) = x(n + N) and assuming that the noise samples v(n) are small enough, 
one may argue that 


K-+ng-1 K-+ng—1 


5 x'(n + N)x* (n) & e/AeeN 5 |x(n)|* (17.107) 


n=no n=N0 
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where nọ is the starting point and K is the number of sample pairs that are used. Solving 
Eq. (17.107) for Aw,, we obtain 


K—-no-1 1 pe 

1 nan, ¥ (H+ N)x"(n) 

Aw, X —Z (== i ) (17.108) 
4 Zr Oe 


where <(-) denotes the angle of. 


Timing Phase Acquisition and Tracking 


For the purpose of timing phase tuning, we should have access to the samples of the 
received baseband signal at a rate higher than its Nyquist rate. Let x(n) denote a sequence 
of signal samples at a rate L/T. To select a good timing phase, one may take one of the 
following steps (other alternatives are also possible): 


1. If the equalizer is a fractionally spaced one, any choice of the timing phase, as long 
as it is maintained throughout the payload, will work well. A convenient method thus 
may be to examine all (quantized) timing phases of 0, T/L, 2T/L, ..., (L — DT/L, 
evaluate the signal power over one or more cycles of the cyclic preamble, and choose 
the one that results in the maximum signal power. Subsequently, either the early-late 
gate or the gradient-based timing recovery that was presented earlier in this chapter 
may be used to maintain the timing phase. 

2. A similar method may be used in the case of a symbol-spaced equalizer. However, 
as demonstrated earlier, the choice of timing phase based on the power maximization 
does not always lead to an acceptable equalizer performance. 

3. For the symbol-spaced equalizers, an alternative to Step 2 is to first identify the channel 
impulse response at the rate L/T samples per second. Then, evaluate the channel 
magnitude response when sampled at the symbol rate for different timing phases. 
From these choices, pick the one with the least variation in the magnitude response 
and set the timing phase accordingly. To make sure that this timing phase will be 
maintained during the payload, identify the offset of this timing phase with respect 
to the one that results in the power maximization. Moreover, during the payload, one 
may choose to adopt either the early-late gate or the gradient-based timing recovery 
to make sure that no timing drift occurs. 


17.7 Joint Timing Recovery, Carrier Recovery, and Channel 
Equalization 


In a receiver where the three functions of timing recovery, carrier recovery, and chan- 
nel equalization are performed concurrently, these will interact. In particular, the channel 
equalizer can compensate for any phase offset and a residual (but, small) carrier fre- 
quency offset. The phase offset 0 is compensated simply by multiplying all the equalizer 
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tap weights with the same factor e~/°. Also, a carrier frequency offset A f, will be auto- 
matically compensated for when the equalizer adaptation is fast enough to track the time 
varying carrier phase 27 A f.nT,. 

The equalizer may also adapt to compensate for any gradual timing phase drift. In fact, 
when a fractionally spaced equalizer is used in a receiver, it is possible to completely 
remove the timing recovery block and let the equalizer track the timing phase drift. In 
such an implementation, as the timing phase drifts, the equalizer tap weights also drift to 
the right or left (depending on the direction of the timing phase drift). If the receiver runs 
for a long time, the timing phase drift may result in an excessive shift of the equalizer 
coefficients to the right or left and result in a significant degradation in performance. 
Fortunately, this problem can be easily fixed. While running the adaptation algorithm, 
one may keep track of the largest tap weight of the equalizer. If it drifts too far from the 
center, the equalizer tap weights are shifted to the right or left to move the tap weight 
with the largest magnitude at the middle. See also the discussion in Section 17.6.3. 


17.8 Maximum Likelihood Detection 


The equalization methods that were discussed in the previous sections may be classified 
as members of suboptimum detectors. An optimum detector can be constructed when one 
has perfect knowledge of the channel impulse response and the statistical properties of the 
channel noise. Such a detector searches over all possible combinations of the transmitted 
data symbols sequence and finds the one that matches best with the received signal. 
However, the detector complexity grows exponentially with the number of the transmitted 
data symbols and, thus, its implementation becomes prohibitive in practice. Fortunately, 
with a negligible loss in performance, the optimal detector can be implemented by using 
an elegant method called Viterbi algorithm. The Viterbi algorithm also has a complexity 
that grows exponentially with the length of the channel impulse response. Hence, for 
a channel with long duration, even the use of Viterbi algorithm may be insufficient to 
provide a feasible implementation. To solve this problem, and yet achieve a near optimum 
performance, it is often proposed to design and use an equalizer that limits the channel 
duration to a finite length. Later, we present a formulation for the design of such equalizers, 
which we refer to as channel memory truncation. 

As for the details of the Viterbi algorithm and other soft equalization (some of which 
are discussed in the next section) the interested readers should refer to the more relevant 
texts, such as Lin and Costello (2004). The idea here is only to bring to the attention 
of readers these more advanced techniques that in recent years have become common 
practice in the industry. 


Channel Memory Truncation 


Following a similar line of thoughts to those in Section 17.4, we consider the system setup 
presented in Figure 17.28 for the study of the channel memory truncation in the case of 
a symbol-spaced equalizer. Here, D(z) = ay d;z™` is a desired transfer function of a 
limited length L. Both the equalizer W(z) and the desired response D(z) are unknown 
and should be optimized jointly to minimize the mean square of the output error, e(n). 
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Figure 17.28 System setup for a symbol-spaced equalizer for channel memory truncation. 


Following the same line of derivations to those presented in Section 17.4.1, it is not 
difficult to show that Eq. (17.69) is also applicable here, with 


Azeros T 


d= 100-0 d d-d 0 (17.109) 


Moreover, Eqs. (17.71) and (17.72) also follow. Assuming d is given, substituting 
Eq. (17.72) in Eq. (17.71), and recalling that R = QTQ* and p = QTd*, after some 
straightforward manipulations, we obtain 


E(Wo) = od" [I — Q(QUQ)'Q" Id (17.110) 


Next, we should find d that minimizes &(w,). To this end, we first note that Eq. (17.110) 
may be reduced to 
E(w) = o?d" Ad, (17.111) 


where d, is the vector d after removing all zeros from its beginning and end, that is, 
d, = [dp di ---d,_,]" (17.112) 


and A is a submatrix of I — Q(Q"Q)~'Q#, obtained by keeping the rows and columns 
that are compatible with d,. Moreover, as for any d,, €(w,) is always greater than or 
equal to zero, the matrix A is a positive definite matrix. Also, to avoid the trivial choice 
of d, = 0 that sets &(w,) equal to its trivial minimum of zero, we impose the unity length 
constraint ||d,||? = d#d, = 1 to d,. With this constraint, recalling the minimax theorem 
that was introduced in Chapter 4, one will find that the optimum choice of d, is the 
eigenvector of A that is associated with its minimum eigenvalue. Hence, we may write 


d,o = the eigenvector of A associated with 4,,;,(A) (17.113) 


min ( 
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where A,,;,(A) means the minimum eigenvalue of A. Moreover, substituting this solution 
in Eq. (17.110), one will find that 


Emin = £ (Wo, do) (17.114) 
= Of Amin (A) (17.115) 


LMS Algorithm for Channel Memory Truncation 


The channel memory truncation concept that was developed earlier was first introduced in 
Falconer and Magee (1973). In this work, the authors also proposed an LMS algorithm for 
joint adaptation of the equalizer coefficients w; and the desired (truncated) response d;. 
They viewed the combination of W (z) and z~“ D(z) of Figure 17.28 as a linear combiner 
with inputs from the samples of x(n) and s(n). The LMS algorithm is applied to minimize 
the mean squared value of the output error e(n). Clearly, the direct application of the LMS 
algorithm results in the trivial and unacceptable solution w = 0 and d, = 0. To avoid this 
solution, Falconer and Magee have suggested that d, should be renormalized to the length 
of unity after each iteration. That is, the updated vector d,(n), after each iteration of the 
LMS algorithm, is normalized as 


d,(1) 


vdii(n)d,(n) 


d.(n) + (17.116) 


17.9 Soft Equalization 


The sequence estimator that was discussed in the previous section searches for the best 
sequence of data symbols that if convolved with the channel impulse response results 
in a signal that best matches the received signal. Hence, the name maximum-likelihood 
detector has been adopted. The more modern communication systems take a different 
approach, which may be thought as a modification to the former. 

An information sequence a(n) of N, bits, before transmission, is coded and converted 
to M, > N, coded bits, b(n). The coded bits, b(n), are related to the uncoded bits a(n) 
and are correlated with one another through the coding scheme. At the front-end of the 
receiver, a detector, known as soft equalizer, evaluates the probability of each coded bit 
for being O or 1 and delivers a number called log-likelihood ratio (LLR). For a given 
coded bit b(n), its LLR value is defined as 


P(b(n) = Olx, h, o2) 


APUSE e oA 


(17.117) 


where P(-|-) is the conditional probability function, x is the vector of the received signal, 
h is the vector of channel impulse response, and o2 is the channel noise variance. 

The information collected by the soft equalizer is passed to a soft decoder that takes 
into account the correlation between the coded bits and improves on the LLR values. The 
difference between the new LLR values and their counterparts from the soft equalizer 
makes a set of new information that was unknown to the soft equalizer. This information, 
that is called extrinsic, is passed to the soft equalizer, to obtain more accurate values of 
LLR whose extrinsic part will be passed to the soft decoder for further processing. This 
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Figure 17.29 A transceiver communication system equipped with a turbo receiver. 


exchange of information between the soft equalizer and decoder continues for a number 
of iterations to converge. This process is often referred to as turbo iteration, and the 
corresponding receiver is called turbo equalizer. After convergence of turbo iterations, 
the decoder decides on its best guess of the uncoded (information) bits, a(n). 

Figure 17.29 presents a block diagram of a transceiver communication system that 
is equipped with a turbo equalizer. The LLR values out of the soft equalizer and the 
soft decoder are called A, and à, respectively. The extrinsic values are identified by 
the superscript “e,” Aj, and A5, respectively. There is also a block called interleaver 
between the eivoer and modulator at the transmitter. This block shuffles the coded bits 
before being passed to the modulator. This is to make sure that the neighboring bits in 
the transmitted sequence are uncorrelated with one another. This process will be undone 
in transferring the LLR values from the soft equalizer to the soft decoder, and will be 
reverted again in transferring the LLR values from the soft decoder to the soft equalizer. 

Any discussion on the coding and soft decoding is beyond the scope of this book. Here, 
we only concentrate on the methods of soft equalization. A soft equalizer, as presented 
in Figure 17.29, has two inputs; the received signal x(n) and the a priori information 
(the extrinsic LLR values) from the soft decoder. There are a variety of the methods 
in the literature that discuss different implementations of the soft equalizers. Two of 
these methods are presented in the sequel. The first method is a modified form of the 
conventional MMSE linear equalizer, which will take into account the a priori information 
5 to improve on the estimates (the LLR values) of the coded bits b(n). The second method 
takes a statistical approach for combining the a priori information 1 and the received 
signal samples for updating the LLR values at each iteration of the turbo equalizer. The 
references cited later are for an interested reader to begin with and explore the related 
literature further. 
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17.9.1 Soft MMSE Equalizer 


The material presented in this section is a simplified version of the soft MMSE equalizer 
of Tiichler et al. (2002). The presentation in this work considers the case where the 
modulator in Figure 17.29 converts the coded bits b(n) to a sequence of QAM symbols, 
s(n). Here, we present the derivations for the case where the modulator converts the 
coded bits b(n) to the binary phase-shift keying (BPSK) symbols that take values of —1, 
for b(n) = 0, and +1, for b(n) = 1. We also limit our discussion to the case where the 
channel impulse response and the equalizer coefficients are real-valued. 

When there exists some a priori information about the transmitted symbols s(n), the 
Wiener filter estimator will be biased and thus the equalizer output (assuming a linear, 
symbol-spaced, or fractionally spaced equalizer) should be written as 


y(n) = w'x(n) +b (17.118) 
where b is the bias. Hence, here, we are interested in minimizing the cost function 
£ = E[(s(n — A) — y(n))] (17.119) 


where A, as defined before, is the delay caused by the combination of the channel and 
equalizer. To proceed, we define 


w= H (17.120) 
and 
X(n) = eal (17.121) 


and note that Eq. (17.118) can be written as 
y(n) = w' x(n) (17.122) 


Using Eq. (17.122) in Eq. (17.119) and expanding the results, we obtain 


E = E[|s(n — A)|?] — w'p — p'w+ Ww Rw (17.123) 
where R = E[&(n)x'(n)] and p = E[s(n — A)X(n)]. Using Eq. (17.121), one will find 
that 

i R E[x()] 
R= ms i | (17.124) 
and 
Ď = | P | (17.125) 
E[s(n — A)] 


Following similar derivations to those in Chapter 3, minimization of the cost function 
Eq. (17.123) leads to the Wiener—Hopf equation RW, = p, which, using Eqs. (17.124) 
and (17.125), can be written as 


R Elx()]] [wo] _ ii 
n 1 | i = ee. — iil (17.126) 
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Note that, here, the subscript “o” has been added to emphasize that the parameters are 
set at their optimum values. 
Solving Eq. (17.126) for w, and b,, we obtain 


w, = R — EKMEK" MD P — Els(n — AJER) (17.127) 


and 


b, = E[s(n — A)] — wl E[x(n)] (17.128) 


Note that the nonzero mean-value terms E[x(n)] and E[s(n — A)] arise as a result of 
the a priori extrinsic information from the soft decoder. When there is no such a priori 
information, E[x(n)] and E[s(n — A)] are both zero. Hence, b, = 0, thus, the estimate 
Eq. (17.118) will be unbiased and Eq. (17.127) reduces to the familiar solution w, = 
R-!p. 

Using Eqs. (17.127) and (17.128), respectively, for w and b in Eq. (17.118), the opti- 
mum estimate of s(n — A) that takes into account the a priori information from the 
decoder is obtained as 


yo(n) = (P — Els (n — ANJE[x(n))"(R — EKMEK mxn) 
+E[s(n — A)] — wl E[x(n)] (17.129) 


We also note that given the LLR values A5 from the soft decoder, E[s(n — A)] is calcu- 
lated as 


E[s(n — A)] = +1 x P(s(n — A) = +1) — 1 x P(s(n — A) = —1) (17.130) 
where P(s(n — A) = +1) and P(s(n — A) = —1) are related according to 


P(s(n— A) = —1) 
P(s(n— A) = +1) 


= AS(s(n — A)). (17.131) 


Also, noting that P(s(n — A) = +1) + P(s(n — A) = —1) = 1, one will find that 


etn (s(n—A)) 
P(s(n-A)=—-l= PEEN (17.132) 
and i 
P(s(n— A) = +1) = EE (17.133) 
Substituting Eqs. (17.132) and (17.133) into Eq. (17.130), we obtain 
L= etn (s(n—A)) 
E[s(n — A)] = (17.134) 


1+ etn (s(n—A)) ` 
To obtain E[x(n)], we note that 


x(n) = Hs(n) + v(n) (17.135) 
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where 
h(L— 1) h(L — 2) ee h(0) 0 nn 0) 
0 h(L —1) h(L — 2) ee h(O) > O 
H= 7 (17.136) 
0 sae 0 h(L—1) h(L—2) --- h(OQ) 


s(n) is the vector of the transmitted symbols compatible with H, and the vector v(n) 
contains the channel noise samples. Assuming that the channel noise has a zero mean, 
from Eq. (17.135), one finds that 


E[x(n)] = HE{[s(n)]. (17.137) 


We also note that the elements of E[s(n)] can be calculated using similar equations to 
Eq. (17.134). 
Next, we use y,(m) to calculate a new value of the LLR according to 
P(s(n — A) = —-1) 


(s(n — A)) = In peo aan (17.138) 


where P(s(n — A) = +1) and P(s(n — A) = —1) are related according to the equation 


P(s(n— A) = +1) — P (s(n — A) = —1) = y(n) (17.139) 
and also we note that 
P(s(n— A) = +1) + P(s(n— A) = —1) = 1. (17.140) 


Solving Eqs. (17.139) and (17.140) for P (s(n — A) = +1) and P (s(n — A) = —1) and 
substituting the results in Eq. (17.138), we obtain 


ibn-i ae, (17.141) 
1+ y(n) 
Finally, 
AS(s(n — A)) = A, (s(n — A)) — A8(s(n — A)) (17.142) 


is calculated and passed to the soft decoder for the next iteration. 


17.9.2 Statistical Soft Equalizer 


The soft MMSE equalizer obtains a linear estimate (an average) of data symbols according 
to Eq. (17.129) and use that to calculate the corresponding LLR values according to 
Eqs. (17.141) and (17.142). 

The accurate estimate of the LLR value A, of the data bit b(n) is given by 


P(b(n) = O|x, h, oĉ, à$) 
P(b(n) = 1|x, h, oĉ, 45) 


A, (b(n)) = In (17.143) 
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where x is the vector of the received signal, h is the vector of channel impulse response, 
a? is the channel noise variance, and À$ is the vector of extrinsic LLR values of all the 
transmitted data bits from the soft decoder. 

Here, also, we assume that the modulator converts the coded bits b(n) to the BPSK 
symbols that take values of —1, for b(n) = 0, and +1, for b(n) = 1. Considering this 
direct mapping between b(n) and s(n), in the rest of this section, we use b(n) and s(n) 
interchangeably, as appropriate. For instance, we use b(n) when reference is to be made to 
LLR values, and use s(n) when the received signal samples are to be calculated according 


to the channel equation 
= 


x(n) = X A(i)s(n — i) + vn) (17.144) 


i=0 


In the sequel, for brevity, we drop h and o? from the equations and thus, to begin, we 
rewrite Eq. (17.143) as 


abn) = In a = > . a (17.145) 
This may be further written as 
x P(bIx, AS) 
Ay(s(n)) = In => Pb = (17.146) 
b:b@=l 


where b is the vector of the transmitted symbols and “b : b(n) = 0” and “b : b(n) = 1” 
means the summations are over all possible combinations of the information bits, subject 
to b(n) = 0 and b(n) = 1, respectively. 
To proceed, we recall the Bayes’ theorem, which for a pair of events A and B is 
formulated as 
P(BIA) P(A) 
P(A|B) = ——_—_—_.. (17.147) 
P(B) 


Using the Bayes’ theorem, Eq. (17.146) can be rearranged as 


p(xlb,A5) P(bIA3) 


€ 
bbm=0 PEAD 


A, (s(n)) = In : 5 
i(s(”)) pAb ADP OAS) 


e 
b:b(n)=1 PIAS) 


> plb)Pb|à;) 

b:b(n)=0 

=| 17.14 

"E paxlby PIAS) em 
b:b(n)=1 


where, in the second line, the condition “b, 25” has been reduced to “b,” because when b is 
specified, the a priori information 45 becomes irrelevant. In Eq. (17.148), we have used the 
common notations where P(-) refers to the probability mass of a discrete random variable 
and p(-) denotes a probability density of a continuous random variable. We also note that 
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p(xib) = | | pais(k -L +1: 4) 
k 


2 
L-1 
x] [exp |- =p ®) Xaski) 
k i=0 
L-1 2 
= exp x(k) -$ h(sk - i) (17.149) 
i=0 
On the other hand, 
P(blA5) = I] P(b(k)|à5(b(k)). (17.150) 
k 
Moreover, as in Eq. (17.132), here, 
ers (bth) ers 0%))/2 
= e = = 
EES AEE + 56H) e5002 4 OW) ete) 
and 
e- 6%)/2 
= e — = 
P(b(k) = 1|45(b(k))) = [pelo T SHOD 2 4 OM (17.152) 
Combining these results, we obtain 
(=1)?)§ (b(k)) /2 
P(b(k)|A3(b(k))) = (17.153) 


e SOW)/2 4 QSOW)/2" 


Next, substituting Eqs. (17.149) and (17.150) in Eq. (17.148) and using Eq. (17.153), 


we obtain 
L1 2 
X exp (-+ x(k) — y h(i)s(k — i) ) 
b:b(n)=0 k i=0 
à; (b(n)) = In a > 5 
X exp (-23 x(k) — > h(i)s(k — i) 
b:b(n)=1 =0 
eC DIOS (o09)/2 
EOR 25 0%))/2 
+ 1n 


ol DPMS ((4))/2 


I e73 CW)/2 45 (H)/2. 


-1 2 o 
2, exp (-25 x(k) — yO +yonoaga) 
b:b(n)=0 k i0 7 
= ln = : 
x ep (-+ 1) — E Osk- i) PEE poiga) 
b:b(n)=1 k i=0 


(17.154) 
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Once ,(b(n)) is calculated, it may be used to obtain Ay (b(n) = A, (b(n)) — A5(b(n)), 
which is then passed to the soft decoder for the next iteration. 

The main difficulty with the computation of à; (b(m)) according to Eq. (17.154) is that 
the number of the terms under the summations over “b : b(n) = 0” and “b : b(n) = 1” 
grow exponentially with the length of b. With the typical length of b in the order of 
at least a few hundreds or, maybe, a few thousands, this clearly results in a prohibitive 
complexity. The key idea in development of statistical soft equalizers is that in practice 
only a small subset of the choices of b have significant contributions to the terms under the 
summations over “b : b(n) = 0” and “b : b(n) = 1.” Hence, a good estimate of A, (b(n)) 
may be obtained, if one can find and limit the summations to these important choices 
of b. An approach that has been successfully adopted in the literature uses a search 
method called Gibbs sampler. The Gibbs sampler is a particular implementation of a 
Maroy chain Monte Carlo (MCMC) simulation that randomly walks through the subsets 
of “b : b(n) = 0” and “b : b(n) = 1” that have significant contributions to the computation 
of 4,(b(n)). Once these subsets are obtained, they will be expanded (as explained later) 
and used to calculate the LLR values 2, (b(n)). We refer to the combinations of these two 
steps as MCMC equalizer. We also call the set of the choices of b obtained through the 
Gibbs sampler the important sample set Z. 


Step 1: Gibbs Sampler 


Let us assume that a total of N, bits are transmitted, hence, b = {b(0), b(1),..., 
b(N, — 1)}. The Gibbs sampler starts with a random (or a deterministic) initialization of 
b = b and iterates through the elements of b according to the following rule. 

During the ith iteration, starting with b=} = {b°—)(0), bo-Y(1),..., bËTD (N, — 
1)}, b® is updated by taking the following steps for each bit. Assuming that b(0) through 
b(n — 1) have been updated, we define 


b, = {b® (0), ..., b® (n — 1), be (n + 1), ..., bY, — D} 
and, for a = 0, 1, we let 
b? = {b® (0), ..., bO mn — 1), a, bET DP (n + b,...,b7 P(N, — 1)} 
The selection of the bit b® (n) is based on the conditional probability distribution y,, for 
a = 0, 1, where 


Ya = P(b(n) = alb,, y, A5(b(n))). 


That is, we choose b(n) =0 with the probability yọ and b(n) =1 with the 
probability y4. 
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Noting that b(n) maps to s(n), we have 
= P(b(n) = alb,, x, AS(b())) 
x p(x|by,) P (b(n) = a) 
= pls) P (b(n) = a) 


Np+L—2 

= [| Pe@s,G-L4+1:))P6@ =a) 

i=0 

n—1 Np+L—2 
= }[[rea@iiG-L4+1:)) [| POE- L+: 

i=0 i=n+L 

n+L—1 
x I] pst — L +1: i) P(b(n) = a) (17.155) 


i=n 
where s% is obtained from a direct mapping of b$. Moreover, because for i < n — 1 and 
i>n+ va the vector si (i — L + 1 : i) is independent of a, Eq. (17.155) reduces to 


n+L-1 
na [[ PEDE- L +1: DPEN =a) 


i=n 
n+L-1 L-1 


1 
=Crexp} J- Jo |" - Sel 4(=1) 


i=n 1=0 


adz(O@)) a )) (17.156) 


where C is a scaling constant to ensure that yọ + yı = 1. 


Step 2: Computation of LLR Values 


Assume that the Gibbs sampler has produced the important sample set Z. Each element in 
T is a bit vector of length N,. As bit b(n) is mapped to symbol s(n), the received signal 
samples that are affected by b(n) are x(n: n + L — 1). As x(n : n + L — 1) depends only 
on bits {b(), ny =n -L+1<1<n+L-—1=n)}, we find that when computing the 
output LLR for b(n), it is sufficient to truncate each sequence in Z to take into account 
only bits {b(/), ny <1 < n3}. We denote the set that contains the truncated sequences 
by Z nin: For each O < n < M, — 1, we construct a larger set Z 7,.,,, which includes all 
sequences in Z „i:n, together with new sequences that are obtained by flipping the nth bit 
of each sequence in Z ,, .,,,. Repetitious sequences are removed from Z p:n,- Furthermore, 
we let Z ca and T a denote subsequences in Z },:n, whose nth bit equals 0 and 1, 
respectively. The LLR value for bit b(n) is then computed as 

P(x(n in +L — IIb, -n,) il P(b())) 

bry:ng ET p mi ng =i 

A,(b(n)) = In F (17.157) 

E pin +L —1)lbyn,) T] POO) 


lan 
bn; ng eIn ny n? : 
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The performance of the MCMC equalizer is dependent on the quality of samples in the 
important set Z. In practice, a Gibbs sampler may require many iterations to converge 
to its stationary distribution. This is called burn-in period. As a result, including the 
burn-in period in the implementation of the Gibbs sampler may increase the complexity 
significantly. In Farhang-Boroujeny, Zhu and Shi (2006), it has been shown empirically 
that the formulas such as Eq. (17.157) still work well if the stationary distribution of 
the underlying Markov chain is replaced by a uniform distribution over the significant 
samples. To obtain samples with this uniform distribution, it has also been noted in 
Farhang-Boroujeny, Zhu and Shi (2006) that a set of parallel Gibbs samplers with no 
burn-in period and a small number of iterations are more effective than using a single 
Gibbs sampler with many iterations. Uniform distribution is obtained by removing the 
repeated samples in Z. 

In order to further guide the reader in understanding the above procedure, the MATLAB 
script “MCMCTest.m” and the accompanying function “MCMCEq.m’” on the accompa- 
nying website are provided. 


Parallel Implementation of MCMC Equalizer 


The Gibbs sampler that was presented earlier runs over a complete packet of received sig- 
nal x(0) through x(L + N, — 2), equivalently, the information bits b(0) through b(M, — 
1), sequentially, one bit at a time. This sequential processing is repeated over a number 
of iterations for each Gibbs sampler on sufficient number of samples of b. Also, as noted 
earlier, a few Gibbs samplers may be run in parallel to obtain a more diverse (a balanced) 
set of samples of b with uniform distribution. 

From an implementation point of view, sequential processing requires a large number 
of clock cycles that is proportional to the sequence length N,,. This (with typical values of 
N, in an order of a few thousands), in turn, means a long processing time or, equivalently, 
a long delay in the detection path. This, of course, is undesirable and should be avoided 
if possible. 

This problem can be solved by introducing a parallel implementation of the Gibbs 
sampler that begins with partitioning the transmitted symbol vector s in to a number of 
smaller vectors (Peng, Chen, and Farhang-Boroujeny, 2010), say, S4, S2, ..., that we call 
subsequences. Then, instead of running a single Gibbs sampler over the entire sequence 
of s, we run parallel Gibbs samplers, one for each subsequence. Within each subsequence, 
the Gibbs sampler updates the symbols sequentially from left to right. Owing to the chan- 
nel memory, the LLR value of a given symbol in a subsequence may depend on the 
values of some symbols belonging to other subsequences. To speed up the convergence 
rate of the Gibbs sampler, it has been empirically found in Peng, Chen, and Farhang- 
Boroujeny (2010) that it will be helpful to synchronize the parallel Gibbs samplers and 
have the updated symbol values available to the neighboring subsequences. To show 
how the parallel implementation works, we examine the example shown in Figure 17.30 
where M, = 8 and s is partitioned into two subsequences. Hence, two Gibbs samplers are 
run in parallel. The symbol updating process of these two Gibbs samplers is shown in 
Table 17.3. At the time 0, the samples are initialized randomly to s®. Then at time i, 
two new samples (one for each subsequence) are updated in parallel, conditioned upon 
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Figure 17.30 Partitioning of symbol sequence of length N = 8 to two subsequences, each of 
length 4. The channel has a length of L = 3. The arrows show the dependency of the received 
signal samples x(n) to the transmitted symbols s(n). The dashed arrows show the inter subsequence 
dependency. 


Table 17.3 Samples updating in parallel Gibbs sampler. 


Gibbs Sampler 1 Gibbs Sampler 1 
t Update Conditioned upon Update Conditioned upon 
0 Initialize s© = {s© (0), s (1), s® (2), s® (3), s (4), s©(5), s6), sO T) 
1 s(0) sO, 0%) Say 82), 5B), 8 (5), 66) 
2s) sO), 5(2), SG) s9(5) 5B), sO, 56), sO) 
3 sQ) sO), 8), 5B), sP sO sO (4), 8S), OT) 
4 s%@B) sO, sP, sO, sD) sO s(5), s®(6) 


the already updated symbols. The reader should carefully examine Table 17.3 to under- 
stand how parallel processing and the connection between the adjacent subsequences are 
being handled. 


17.9.3 Iterative Channel Estimation and Data Detection 


In the derivations made earlier, it was assumed that the channel response h was known 
perfectly. In practice, this is not the case. An initial estimate of the channel may be 
obtained through use of a training sequence that is transmitted at the beginning of each 
packet, as discussed in Section 17.6, or may be repeated periodically when the channel 
varies significantly over the duration of each packet. Such initial estimate(s) of the channel 
is (are) used to run the soft equalizer to produce the first set of LLR values, which will be 
passed to the soft decoder. The LLR values generated by the soft decoder are then used 
to refine the channel estimate before the next iteration of the soft equalizer. This process 
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is repeated for both turbo equalizer and the channel estimator to converge and hence the 
best possible estimate of the transmitted data is obtained. 

The following procedure explains how the LLR values (the soft information) from the 
soft decoder are used to refine the channel estimate. To keep the discussion simple, we 
assume, as before, that the modulator converts the coded bits b(n) to the BPSK symbols 
s(n) that take values of —1, for b(n) =0, and +1, for b(n) = 1. We also note that 
following the same line of thought that led to Eqs. (17.151) and (17.152), one obtains 


e%2(b(n))/2 


-lbn = eA E E (17.158) 


P(s(n) = 


and 
en A2(b(n))/2 


e-A2(b(m))/2 4 eha(b@)/2" 


P(s(n) = +1|à2(b(n))) = (17.159) 


Note that here we use 1,(-) (not the extrinsic information 15(-)). 
On the other hand, we recall the least-squares cost function ¢, of Chapter 12. When 
the sequence s(n) is perfectly known, the channel estimate h is obtained by minimizing 


6 = X [x(n) —h"s(n)/? (17.160) 


n 


where h = [h(0) h(1)--- A(L — 1)]' and s(n) = [s(n) s(n — 1) --- s(n — L + 1)]". When 
the sequence s(n) is stochastic, as is the case here, Eq. (17.160) should be replaced by 


6 = }_ Ellx(n) — h's(n))7] (17.161) 


n 


where the expectation is over the elements of the sequence s(n). Expanding Eq. (17.161), 
we obtain 


6 = $ x? — 2h" $ Efs(n)]x(m) — h" (x E so] h (17.162) 


Hence, 


—1 
h= (x E [son] 2 Evi) l (17.163) 


In Eq. (17.163), the elements of s(n) are independent random variables with the statis- 
tics given as in Eqs. (17.158) and (17.159). Accordingly, the ith element of E[s(n)] is 
obtained as 


(EIs(n)]); = Elsa — i)] 
= (-1) x P(s(n) = —1A,(b(™))) + (+1) x P(s(n) = +12 (b(n))) 17.164) 
Also, the ijth element of E [s(n)sT(n)] is obtained as (since s(n — i) and s(n — j) are 
independent) 
T I, i=j 
Eou y= aua DIE — DL iF i 
where E[s(n — i)] and E[s(n — j)] are computed according to Eq. (17.164). 


(17.165) 
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17.10 Single-Input Multiple-Output Equalization 


In certain wireless applications where the channel suffers from a severe ISI and/or a high 
level of noise and interference, the received signal is picked up by a number of antennas 
that are spatially spread apart sufficiently to observe a set of independent copies of the 
transmitted signal. The channels formed in this way are often referred to as single-input 
multiple-output (SIMO). Moreover, the receiver performance gain achieved as a result of 
processing more than one copies of the received signal is called diversity gain. 

In the electromagnetic wireless channels, the use of multiple receive antennas often 
falls under the category of smart antennas. As discussed in Chapter 6 (and is devel- 
oped further in the next chapter), by combining the signals from various antennas, one 
can form an antenna directivity that sees the desired signal and introduces nulls in the 
directions of interfering signals. Moreover, SIMO channels have been found very useful 
and are widely used in the underwater acoustic (UWA) communication channels where 
electromagnetic waves are replaced by acoustic waves. This is because electromagnetic 
waves cannot propagate far enough in water. The UWA channels are characterized by 
long impulse responses with many nulls/fades within the band of interest. The presence 
of nulls, obviously, degrades the equalizer performance. The rationale for the use SIMO 
setup in this application is that it is very unlikely that all the transmission paths between 
a transmit antenna and a few receive antennas undergo a fade at the same frequency. 
Hence, the use of a SIMO setup provides enough diversity in the transmission paths that 
assures adequate energy will be received at all the frequencies over the transmission band. 

To develop an insight to the concept of diversity gain, consider the SIMO channel 
presented in Figure 17.31. Also, let us consider the simplified case where the channel 
gains between the transmit antenna and all the M receive antennas are the same and 
equal to one. However, the channel noise at the receive antennas are a set of independent 
identically distributed (iid) random processes. Hence, the received signals xg(n) through 


SIMO 
channel a 
va r9(n) 


va 


Figure 17.31 Single-input multiple-output channel. 
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Xy—,() are given as 
X;(n) = x(n) + v;(n) (17.166) 


where v F (n), fori =0,1,..., M — 1, are a set of iid processes. Assuming that the pro- 
cesses x(n) and v;(n) have zero mean, and using a and o? to denote the respective 
variances/powers, the signal-to-noise ratio (SNR) at each receive antenna is obtained as 


2 


i= = (17.167) 
o% 
As the noise components v;(n) are uncorrelated with one another, one may argue that an 
optimum receiver takes the average of the received signals xọ(n) through x,,_,(”) as the 


best estimate of the transmitted signal x(n), viz., 


M-1 
1 

=x(n)+ a Dy v (n) (17.168) 

i=0 
Now, one may note that the variance of the noise term on the right-hand side of 
Eq. (17.168) is equal to o7/M and, hence, the SNR at the receiver output (i.e., after 

averaging) is obtained as 
2 


= = 
o?/M 


p= Mp, (17.169) 
In other words, the SNR after receiver processing improves by a gain of M, the number 
of the receive antennas. 

Next, let us consider the case where the channel has a gain of g; between the transmit 
antenna and the ith receive antenna. Hence, 


X;(n) = gix (n) + v; (n) (17.170) 


In this more general case, we still assume that the noise components v; (n) have the same 
variance, equal to oĉ. We wish to find a combining gain vector w for obtaining the best 
unbiased estimate and of x(n) as 


$(n) = w'x(n) (17.171) 


where x(n) = [xọ(n) x(n) +++ Xy_ (n)|'. The unbiased estimate means that the mean of 
X(n) must be equal to x(n). This, in turn, implies that the difference x(n) — X(n) should 
be minimized subject to the constraint wg = 1, where g = [gp g;---gy_4]'. 

Now, defining the cost function 


é = El|x(n) — &(n)|7] 
= E[|x(n) — włx(n)|?] (17.172) 


and attempting to minimize it subject to the constraint wg = 1, one will find that this 
is the same constraint minimization problem as the one that we solved in Section 6.11, 
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using the method of Lagrange multipliers. Following the same line of derivations, here, 
one will find the following solution 


wo = —-g (17.173) 


where the superscript “c” is to emphasize that the solution is obtained under the constraint 
condition. 
The solution (17.173) leads to the following estimate of the desired signal, x(n): 


H 

în) = aa. (17.174) 
8 8 
This has the following interpretation. The estimate (n) is obtained as a weighted average 
of the elements of the received signal vector x(n) with the weighting factors that are equal 
to the channel path gains between the transmit antenna and each of the receive antennas. 
In other words, the paths with higher gains are given higher weights, obviously, because 
they lead to less noisy signal components at the receiver. Consequently, the estimator 
(17.174) is called the maximum ratio combiner. 

Further generalization of the maximum ratio combiner may be suggested for the case 
where the noise (plus interference) components v;(n) have different powers/variances. 
The result will be that the combiner weight at each tap is proportional to the SNR (the 
square root of SNR, to be more exact) at that tap, as one would expect; see Problem P17 
at the end of this chapter. 

When the channel is a broadband one, and is hence characterized by a frequency 
dependent gain across the band of interest, the received signal at each antenna should be 
passed through a transversal filter with several taps. Adding the outputs of these transversal 
filters to form a single output form a multiple-input single-output (MISO) equalizer. 
Clearly, any of the adaptive algorithms that have been presented in the previous chapters 
of this book can be applied for adaptation of the MISO equalizer as well. The adapted 
equalizer, after adaptation, forms a frequency-dependent maximum ratio combiner that 
for each frequency considers the SNR values at the various receive antennas and performs 
the combining accordingly. We note that for a finite length equalizer, the maximum ratio 
combining is only achieved to the extent allowed by the limited equalizer length. In other 
words, only an approximate maximum ratio combiner can be realized. Problem P17, at 
the end of this chapter, guides the reader to develop further insight to this property of 
MISO equalizers through a numerical example. 


17.11 Frequency Domain Equalization 


Frequency domain equalization method that is presented in this section has been proposed 
as a method of reducing the complexity and, for some scenarios, fast adaptation of the 
equalizer to channel variations. As discussed later, the frequency domain equalizer makes 
use of certain properties of the discrete Fourier transform (DFT) to design a data packet 
structure that when transmitted over an ISI channel, the ISI effects could be removed 
through a simple processing in the frequency domain. 
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17.11.1 Packet Structure 


Because of reasons that will become clear later, to facilitate the frequency domain equal- 
ization, the stream of transmit data symbols is partitioned into blocks of size N symbols. 
A guard interval of a length equal to the duration of the channel impulse response or 
greater is inserted between every two consecutive blocks. Moreover, each guard interval 
is filled up by the symbols from the end of the succeeding data block. This concept is 
presented in Figure 17.32. 

To be more explicit, we note that, in Figure 17.32a, the Nth block contains the vector 


s(nN ) 
s(nN + 1) 
s(n) = ; (17.175) 
s(nN +N — 1) 


After the insertion of the guard interval and copying the symbols from the end of the 
succeeding block, the result will be an extended block whose content is the vector 


s(nN +N — Nz) 


s(nN + N —1) 


s(n) = s(nN) (17.176) 
s(nN + 1) 
s(nN +N —1) 


where N, is the length of the guard interval. 


sT(n) s'(n+1) 


Ss intervals Sg 
a (E E 
E Ln 


copy copy 


(b) 


Figure 17.32 Packet structure for frequency domain equalization: (a) data stream before insertion 
of guard intervals and (b) data stream after insertion of guard intervals. 
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One may note that s, may be thought of as a length N + N, extraction from a periodic 
sequence whose one period is s,. We will take advantage of this structure of 5„, in the 
next section, to develop a low-complexity frequency domain equalizer. Alternatively, one 
may say, the vector S„ is obtained by copying the last N, elements of s, to its beginning 
as a cyclic prefix (CP). The name CP is thus used to refer to the data symbols that are 
used to fill in the guard interval between each pair of successive data blocks. 


17.11.2 Frequency Domain Equalizer 


To develop the structure of the desired frequency domain equalizer, consider the case 
that an isolated (and extended) block S„ is passed through a channel with the impulse 
response 

h = [h(0) A(1)--- ACL — 1)" (17.177) 


The received signal, excluding the channel noise, will be the vector x, obtained by 
convolving s, and the vector of the channel impulse response, h. The vector x, has 
a length of N + N, +L — 1. Also, because of the presence of the CP in 5,, it is not 
difficult to show that the elements UN, + 1)st through (No + N)th of x, (stored in a vector 
X„) are obtained by circularly convolving the vector s, and h. Mathematically, this is 
written as 

Hs, = x, (17.178) 
where H is a circular matrix whose first column is obtained by appending a number of 
zeros to the end of h to extend its length to N. 

Next, we argue that Eq. (17.178) is still valid when a sequence of cyclic prefixed 
blocks §,, are transmitted. This is because the CP of each block absorbs the transient of 
the channel and the effect of the previous block will be faded out before we reach the 
CP stripped part of each block, namely, x,,. 

Recalling the theory of circular matrices, which was presented in Chapter 8, one may 
note that the circular matrix H can be expanded as 


H = FHF (17.179) 


where F is the DFT transformation matrix and H is the diagonal matrix whose diagonal 
elements are obtained by taking the DFT of the first column of H. Substituting Eq. 
(17.179) in Eq. (17.178) and rearranging the result, we obtain 


s, = F-'H"'Fx,. (17.180) 


This result has the following interpretation. Given a cyclic stripped block of the received 
signal, x„, the associated data symbol vector, s„, can be obtained by taking the following 
steps: 


1. Take the DFT of x,. 

2. Point multiply the elements of the result from Step 1 by the inverse of the elements 
of the DFT of the first column of H. 

3. Take the inverse DFT of the result in Step 2. 
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The following comments regarding the earlier steps are in order. Step 1 gives the 
vector Fx, Step 2 calculates the vector H~!Fx,,. This is equivalent to saying a single- 
tap frequency domain equalizer is applied to each of the frequency domain elements of 
Fx,,. Step 3 converts the equalized signal samples from the frequency domain to the time 
domain. 

We also take note of the following points. 


e The addition of the guard interval avoids interference among the data symbol 
blocks s,,. 
e The use of the CP allows application of a trivial single-tap equalizer per frequency 
domain sample. 
e The presence of the guard interval/CP takes up a fraction of time, which otherwise 
could be used for data transmission. This constitutes an overhead, which results in 
a loss in the use of the available spectrum. This is quantified by the efficiency loss 
N,/(N + Ng) or, equivalently the efficiency N/(N + N,). 
e The single-tap equalizer coefficients are the inverse of the samples of the N-point 
DFT of the channel impulse response. Hence, to initialize/track the equalizer coeffi- 
cients, the channel impulse response should be estimated/tracked. The method of cyclic 
equalization/channel estimation that was introduced in Section 17.6 can also be applied 
here. For tracking, one may adopt a decision-directed method, where the decisions 
from each block are used to update the channel estimate after processing of each block 
x,- For the fast varying channels, where the decision-directed method may fail, an 
alternative method (an extension of cyclic channel estimation) is presented in the next 
section. 
The single-tap equalizer coefficients that are suggested in Eq. (17.180) (also Step 2 
of the earlier procedure) ignore the channel noise and hence its possible amplification at 
the frequency points where the channel gain is low. To avoid such noise enhancement, 
the common solution is to adopt an MMSE equalizer. In that case, Eq. (17.180) is 
replaced by 


S, = F_'(HH* + el) 'HFx, (17.181) 


where € is the noise power over signal power at the equalizer input. 


17.11.35 Packet Structure for Fast Tracking 


As discussed earlier, the cyclic equalization/channel estimation method that was intro- 
duced in Section 17.6 may be used to initialize the coefficients of the equalizer, at the 
beginning of a communication session/a packet, and subsequently a decision-directed 
method may be used to track variations of the channel for the rest of the session/packet. 
However, in applications where channel variation is too fast and/or because of channel 
noise, decision errors may lead to a failure of the decision-directed method, regular train- 
ing symbols may be inserted in the transmit signal. An interesting method of inserting 
such trainings has been discussed in Falconer et al. (2002). Here, we elaborate on this 
method. 

The packet structure presented in Figure 17.32 may be modified as the one presented 
in Figure 17.33. Here, each data block s,, is preceded with a training block (TB) and suc- 
ceeded with the same TB. The TB is a fixed block of known symbols to both transmitter 
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Figure 17.33 Packet structure for fast tracking in frequency domain equalizers. 


and receiver. The length of TB should be equal to or greater than the duration of the chan- 
nel impulse response. Each data block s,, combined with the succeeding TB has the length 
of the DFT used for the frequency domain equalization. Accordingly, the concatenation 
of s„ and its succeeding TB plays the same role as the data block s, in Figure 17.32. 
Moreover, the TB preceding s, plays the role of the CP for the block consisting of s,, 
and its succeeding TB. In addition and interestingly, in a pair of two adjacent TBs, the 
first TB acts as a CP to the second TB. Therefore, the received version of the second TB 
will be the circular convolution of the transmitted TB and the channel impulse response. 
Consequently, a cyclic channel estimator, similar to the one in Figure 17.26, can be used 
to estimate the channel impulse response h. Finally, we note that this clever choice of the 
packet structure comes at a price. As part of each data block should be given up in favor 
of a TB, its efficiency reduces from N/(N + N.) to (N = Ng)/(N + Nz). 


17.11.4 Summary 


Figure 17.34 presents a summary of our findings in this section. To facilitate the imple- 
mentation of a computationally efficient frequency domain equalizer, the transmit data 
stream is partitioned into blocks of N symbols, each. Each block is cyclic prefixed from 
samples from the end of the block. This extended stream is then passed through the chan- 
nel. The CP samples act as a guard interval that keep the data block free of interblock 
interference. They also play a key role in facilitating the task of frequency domain equal- 
ization. Because of the choice of the CP samples, the received signal vectors x,, is the 
circular convolution of the symbol vector s„ and the channel impulse response vector h. 
Consequently, frequency domain equalization can be performed by taking the DFT of 
each block x,,, applying a single-tap equalizer per DFT output, and converting the results 
back to the time domain through an IDFT. 


17.12 Blind Equalization 


In certain applications, for example, when a continuous stream of data is being broad- 
casted, there may be no training/preamble for tuning of the receiver to the incoming 
signal. Equalizer adaptation in such applications that should be performed without any 
access to a desired signal is referred to as blind equalization. In the literature, three classes 
of blind equalization have been proposed. They are as follows: 


1. Higher-Order Statistics-Based Methods: These methods obtain an estimate of the 
impulse response of the equivalent baseband channel from the estimates of the third 
or higher-order statistics of the received signal. Subsequently, using equations similar 
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trasmitter 


1. Partition the transmit data streams FIAR | 
to blocks of size N symbols 


receiver { 


3. Strip off the CPs from the received signal sae Xn Xn+1 


4. Apply a DFT to each block 


5. Apply single-tap frequency domain equalizers equalization 


<= 


6. Apply an inverse DFT to block Y 


Recovered data stream oes Sn Sn41 


Figure 17.34 Summary of the frequency domain equalization. 


to those developed in Section 17.4, the equalizer coefficients are calculated based on 
the estimated channel. 

2. Cyclostationary (Second-Order) Statistics-Based Methods: These methods take note of 
the fact that in a digital communication system, the received signal is cyclostationary 
and its statistics repeat after every symbol interval. This property allows one to obtain 
an estimate of the impulse response of the equivalent baseband channel from the 
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estimates of the second-order statistics of the received signal. The equalizer coefficients 
are then calculated based on the estimated channel. 

3. Kurtosis-Based Methods: These methods take note that the quality of the output of 
the equalizer can be measured by its kurtosis, a proper measure of its probability 
distribution. The kurtosis-based methods have also been categorized under the class of 
Bussgang algorithms (Haykin, 1996). 


From these, the third method is more widely used in practice. In this section, we limit 
our discussion to the kurtosis based blind equalizers with an attempt to provide some 
intuitive understanding of how they work. 


17.12.1 Examples of Kurtosis 


The common kurtosis that has been used for channel equalization is defined as follows. 
For a random variable x with the probability distribution p,.(x), the /th kurtosis of x is 


defined as wid 
q = Ele Ce 
(E[|x|')* 


where E[-] denotes the expected value. We also recall that 


lo.) 
E[|x|‘] = |x|! p (x)dx. (17.183) 
=p 

In the case where x is complex-valued, p,(x) spans over the real and imaginary parts of 
x and accordingly the integration on the right-hand side of Eq. (17.183) will be a double 
integral. 

The common choices of / for blind equalizers are l = 1 and 2. Let us concentrate on 
the choice of / = 2. Figure 17.35 presents a few choices of the distribution function p,.(x) 
and their associated kurtosis x). When x is normally distributed, x) = 3. One may also 
observe that x, exceeds this value when p,(x) has a sharper peak and/or longer tails. On 
the other hand, for distributions with a more flat peak and/or shorter tails, k) < 3. These 
observations may lead one to conclude that kurtosis provides a measure of the peakedness 
of the probability distribution of a random variable. 

To develop further insight, which may prove useful for the rest of our discussions in 
this section, consider a few choices of pulse amplitude modulated (PAM) symbols. When 
the symbols s(n) belong to a two-level PAM with equiprobable alphabet {+1, —1}, it is 
straightforward to show that (when x = s(n)) k) = 1. When the symbols s(n) belong to 
a four-level PAM with equiprobable alphabet {+3, +1, —1, —3}, one can show that x) = 
1.64. When the symbols s(n) belong to an eight-level PAM with equiprobable alphabet 
{+7, +5, +3, +1, —1, —3, —5, —7}, one can show that x) = 1.7619. Here, we make the 
following observations. 


e As the alphabet size of the PAM symbols increases, its kurtosis approaches the value 
of 1.8, which is that of the uniform distribution; see Figure 17.35. 


e Kurtosis x, is a decreasing function as the number of symbols/level in a PAM decreases. 


Similar observations are made when x is a complex-valued random variable. 
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Figure 17.35 Probability distribution functions of a few random variables and their associated 
kurtosis values. (a) Gaussian distribution, p,(x) = e~**'; (b) Uniform distribution, p,(x) = 1 in 
the interval —0.5 < x < 0.5; (c) Half-cosine distribution, p, (x) = cos(x x) in the interval —0.5 < 


x < 0.5; (d) Exponential distribution, p, (x) = 2e~4"*'. 


17.12.2 Cost Function 


Following the earlier observations, we may argue that to develop an adaptive blind equal- 
ization algorithm one can evaluate the kurtosis of the equalizer output and adjust the 
equalizer coefficients in the direction that reduces the kurtosis. The rationale here is that 
an unequalized output that suffers from a significant level of ISI has a Gaussian-like dis- 
tribution, hence, a large kurtosis. As the adaptation proceeds and the equalizer coefficients 
are adjusted to reduce kurtosis of the resulting output, it will expectedly converge in the 
direction where the final outputs will be those of the transmitted PAM/QAM symbols, as 
the PAM/QAM symbols offer the smallest kurtosis. 

Unfortunately, because of the rather complex form of the kurtosis expression (being the 
ratio of two expectations), the derivation of a kurtosis-based adaptation algorithm may 
not lead to a fruitful product. Instead, the following cost function has been suggested for 
development of the practical blind equalizers. 


E = Ef[(jy(n)|' — y)7] (17.184) 


where y(n) is the equalizer output and y is a constant whose value will be derived later. 
In the case where the data symbols s(n) are chosen from an alphabet where |s(n)| is a 
constant, say, equal to r, the choice of the cost function 


E = Ef(\y(n)|! —7')7] (17.185) 
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and its minimization leads to an equalizer output that satisfies |y(n)| = r, because this 
reduces € to zero; an absolute minimum for €. Of course, this ideal scenario can be 
only possible when the channel noise is zero and the equalizer length approaches infinity. 
Nevertheless, this argument provides some convincing reasoning on why the cost function 
(17.184) can be effective. 

In the more general case where the data symbols s(n) are from a general QAM con- 
stellation, where |s(n)| varies with n, a dual to Eq. (17.185) is the cost function 


é = Ef(\y(n)|' — s(n). (17.186) 


This of course is not a practical cost function, as it requires the knowledge of |s(n)|, a 
quantity that is unknown to the receiver by the virtue of the blind equalization. Instead, 
in Eq. (17.184), one may choose the parameter y such that when Ẹ is minimized, 
y(n) converges toward a process with the same statistical characteristics as the transmit 
symbols s(n). 
Recall that 
y(n) = w'(n)x(n) (17.187) 


where w(7) and x(n) are the equalizer tap-weight vector and tap-input vector, respectively. 
Also, using Eq. (17.187), we obtain 


ly(n) |? = wi nxn" nwon). (17.188) 
Substituting Eq. (17.188) in Eq. (17.184), we get 
§ = ELW mxx wan)? — y)’. (17.189) 
Upon convergence of the blind equalizer, when € is minimized, the equality 
vS =0 (17.190) 


should be satisfied. In Eq. (17.190), following the notation introduced in Chapter 3, VCé 
denotes the gradient of £ with respect to the complex-valued vector w. Moreover, recalling 
from Chapter 3 the identity VCw4Rw = 2Rw, which is valid for any arbitrary matrix R, 
substituting Eq. (17.189) in Eq. (17.190) and letting w(n) = wo, we obtain 


E[2(/2)x(n)x4 (nw, (wix(n)x" (n)w,) “>? ((wHx(n)x" (n)w,)!/? — y)] = 0. 
(17.191) 
Next, multiplying Eq. (17.191) from left by wH, substituting wHx(n) by y,(n) (the output 
y(n) when w(n) is replaced by its optimum value w,), and removing the constant factor 
2(1/2), we get 
E[l YW — y)] = 0. (17.192) 


Solving this equation for y, we obtain 


Elly m] 


= = 17.193 
Elly.()\'] 
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Upon convergence of the equalizer tap weights, assuming that they have converged to the 

desired values, y (n) © s(n), hence, one may argue that the choice of 
E 2l 

_ Eiso è] es 

E[|s(n)|] 


is reasonable. 


17.12.35 Blind Adaptation Algorithm 


Starting with the cost function € and following the LMS algorithm philosophy to replace 
E by its coarse estimate Ê = (\y(n)|' — r')? in a steepest descent recursive equation, we 
obtain 

w(n + 1) = w(n) —Imy*(n)| y(n)" Ay = y)x(n) (17.195) 


where u is a step-size parameter. For the choice of l = 1, Eq. (17.195) reduces to 


win 1) = wor) =p E (17.196) 
ly(n)| 


with y = E[|s(n)|7]/E[|s(n)|]. For the choice of / = 2, on the other hand, we obtain 
w(n + 1) = w(n) — 2ny*(n)(ly()|* — y)x(n) (17.197) 


with y = E[|s(n)|*]/El|s(@)|?]. 

One may note that the choice of l = 2 has a simpler form than the choice of / = 1. 
This is because the computation of | y(n)| = y Ilym)? = ,/ ya (n) + yp (n) is more involved 
than the computation of |y(n)|? = y(n) + y? (n) and besides, there is a divide by |y(n)| 
in Eq. (17.196), which is absent in Eq. (17.197). Moreover, in practice l = 2 works 
better than the case of l = 1. The programs “Blind_equalizer_11.m’ and “Blind_ 
equalizer_12.m” for experimenting with the blind equalizers with choices of / = 1 
and / = 2, respectively, are available on the accompanying website of this book. 


Problems 


P17.1 A naive pulse shape that satisfies the Nyquist condition is p(t) = 
Pr(t) * pR(t) = A(t/T), where 


i 1+t/T, -T <t<0 
a(z)= 1-t/T, O0O<t<T 
0, otherwise. 


For this choice of p(t) and an ideal channel c(t) = 6(t), consider the received 


signal 
Co 


x)= > s(n)p@ =n). 


n=— 00 
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P17.2 


P17.3 


P17.4 


P17.5 


P17.6 


P17.7 


(i) Assuming that the data symbols s(n) are independent of one another and 
E[|s(n)|?] = o2, evaluate and obtain an expression for 


p(t) = El|x(nT + 1)|°] 


forO<t<T. 

(ii) Present a plot of p(t). You should find that in contrast to the fundamental 
results presented in Section 17.3, where p(t) was a biased sine wave, here, 
p(t) has a different form. Explain what the source of this discrepancy is. 

(iii) What is the value of t that maximizes p(t). For this choice of t, find 
samples of x(t) at the sampling times nT + t and show such choice results 
in zero ISI. It is, thus, the optimum timing phase. 


Run the MATLAB script “TxRxQAMELG.m,” available on the accompanying 
website, for a number of choices of the parameters jz, dt, and £, to gain a more 
in-depth understanding of the early-late gate timing recovery algorithm, and to 
confirm the results presented in Figures 17.16 and 17.17. 


Run the MATLAB script “TxRxQAMGB.m,” available on the accompanying 
website, for a number of choices of the parameters u and £, to gain a more 
in-depth understanding of the gradient-based gate timing recovery algorithm, 
and to confirm the result presented in Figure 17.18. 


(i) Following the same line of derivations to those in Section 17.4, derive the 
relevant equations for designing a DF equalizer with a half symbol-spaced 
feedfoward filter. 

(ii) Add a new section to the end of the MATLAB script “equalizer_ 

eval.m” to numerically examine the results of the derivations of (i). 

(iii) By examining the modified script, confirm the results presented in 

Figures 17.23 and 17.24 and compare them with those of the 
fractionally spaced DF equalizer that you developed. 


Run the MATLAB script “equalizer_eval.m’ on the accompanying web- 
site to confirm the results presented in Section 17.4.2. Run the script for a few 
additional choices of channel (that you choose) to develop further insight to the 
performance of the various equalizers. 


Develop a MATLAB program to study the convergence behavior of the 
LMS algorithm when applied to the channel equalization problem. Examine 
and present the learning curves of the algorithm for the more common and 
the less common channels that are listed in the MATLAB script “equal- 
izer_eval.m.” In particular, examine the equalizer lengths mentioned in 
Figures 17.23 and 17.24 and confirm that for the various choices of the timing 
phase, the mean-squared error of the equalizer converges toward the values 
presented in these figures. 


Repeat Problem P17.6 when the RLS algorithm is used for the adaptation of the 
equalizer tap weights. 
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P17.8 


P17.9 


P17.10 
P17.11 


P17.12 
P17.13 


P17.14 


P17.15 


P17.16 
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Repeat Problem P17.6 when the LMS-Newton algorithm is used for the adap- 
tation of the equalizer tap weights. 


Repeat Problem P17.6 when the affine projection LMS algorithm is used for the 
adaptation of the equalizer tap weights. 


Present a detailed derivation of Eq. (17.111). 


Consider the system setup of Figure 17.28. Let the channel noise be absent and 
the impulse response of the channel be 


h = [0.1 — 0.2 0.3 0.7 1 0.8 — 0.5 — 0.2 0.1]7 


(i) Find the equalizer vector w that truncates the channel response to a length 
of L =3. Let A = 5. To confirm your solution compare the sequences 
h(n) * w, and d,. 

(ii) Develop an LMS algorithm for joint adaptation of w and d, and confirm 
that it converges to the solution you obtained in (i). 


Present a derivation of the solution (17.173). 


Figure 17.26 presents a structure for identifying the channel impulse response 
at the spacing T. Suggest a method of modifying this structure for identifying 
the channel impulse response at the spacing T/L, for an arbitrary integer L. 


Starting with the MATLAB script “CyclicEq_eval.m’” on the accompanying 
website, extend it to confirm the results presented in Table 17.2. 


Develop a MATLAB code to obtain an estimate of the equivalent baseband 
impulse response of a channel by sending a single symbol s(n) = d(n) and 
taking the received signal samples at the required rate, say, at the rate of 1/T. 
Compare the result with what you obtain through the use of the MATLAB script 
“CyclicEq_eval.m’ and confirm that both give the same channel estimate 
when channel noise is absent. 


In Figure 17.31, let the received signals x(n) be related to the transmit signal 
x(n) according to the equations 


x(n) = g,x(n)+v,(n), for k=0,1,...,M—1 


where v,(n) is a white noise with variance o7. We wish to design an 


unbiased estimator for x(n) using the received signal samples x(n), for 
k=0,1,...,M — 1. To this end, one may first multiply both sides of the 
above equation by 1/0, to obtain 


x(n) = gy x(n) +y,(n), for k=0,1,...,M—1 
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P17.17 


where x(n) = x(n)/ox, 8; = 8 /0;%, and the noise terms v, (n), for all k, have 
unit variance. 


(i) Design the maximum ratio combiner for the unbiased minimum variance 
estimate of x(n) from the signal samples x; (n). 

(ii) Using the result of (i) find the maximum ratio combiner for the unbiased 
minimum variance estimate of x(n) from the signal samples x,(7). 


Consider a SIMO channel with two receive antennas. Figure P17.13 presents a 
block diagram of the complete communication system, including a multiple-input 
single-output (MISO) equalizer at the receiver. Let the channel noise processes, 
Vo(n) and v,(n), be identically independent white noise with the variance oO. 
Also, assume that the transmit data sequence s(n) is a white random sequence 


with variance o2. 


(i) Assuming that the equalizers Wo(z) and W,(z) are unconstrained in dura- 
tion show that the optimum choices of them that minimize the mean- 
squared error E[|s(n) — $(n)|7] are given by 


Woo(2) = — os HG) pa 
os (| Ho) + |H) + o; 
i o2 H* (z) 
Wio) = — 


og (| Ho (2)? + AP) + oF 


Hint: Refer to Chapter 3 to recall how to derive unconstrained 
Wiener—Hopf equations. 

Gi) Simplify Wo ,(z) and Wi ,(z) for the case where Họ(z) = 1+z7!, 
H(z) = 1 —z7!, o = 1, and o? = 0.1. 


SIMO channel MISO equalizer 


Figure P17.13 
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(iii) By presenting the plots of the magnitude responses of Wp ,(z), and W; ,(z) 
and comparing them with those of H(z) and H(z), respectively, explain 
how maximum ratio combining is established in the numerical case of (ii). 


P17.18 Following the steps summarized in Figure 17.34, develop a MATLAB script to 
implement a communication system that is equipped with a frequency domain 
equalizer and test its correct operation. 


P17.19 The MATLAB script “Blind_equalizer_11.m,” on the accompanying 
website, allows you to test the performance of the kurtosis based equalizer 
for the case where the parameter / = |. To explore the convergence behavior 
of the stochastic adaptation algorithm (17.195), examine this program for 
various constellation sizes and the step-size parameter jz. Note that for larger 
constellations, convergence may occur only after many hundreds of thousands 
or even millions of iterations. Hence, work patiently. 


P17.20 Repeat Problem P17.19, when “Blind_equalizer_11.m’ is replaced by 
“Blind_equalizer_12.m,” also available on the accompanying website. 


18 


Sensor Array Processing 


Sensor array processing has broad applications in diverse fields such as wireless communi- 
cations, microphone arrays, radar, seismic and underwater explorations, radio astronomy, 
and radio surviellence. Direction of arrival (DOA) estimation and beam forming are two 
basic areas of sensor array processing that are common to its diverse applications. In 
this chapter, we review a number of signal processing techniques that over the past few 
decades have been developed for DOA estimation and/or adaptive beam forming. 

Sensor array processing techniques have been developed and applied to both narrow- 
band and broadband signals. However, the underlying concepts (which are common to 
both narrowband and broadband cases) are more easily understood when presented in the 
context of narrowband processing methods and algorithms. Noting this, most of the devel- 
opments in this chapter are first presented in the context of narrowband signals. Extensions, 
which will be presented in the later parts of the chapter, will then be straightforward. 

Implementation of a sensor array involves design and fabrication of sensor elements 
and the attached electronics, placement of the sensors according to a desired geometric 
topology, and development of a proper signal processing algorithm. The subject of this 
chapter is related to the latter, that is, development of signal processing algorithms for 
sensor arrays. In this development, often idealistic assumptions are made about the fab- 
ricated sensors and their position accordingly to the assumed geometric topology. This 
may not be the case in practice. Hence, some tolerance should be admitted for each 
fabricated sensor element. Also, the placement of the sensor elements according to the 
desired geometric topology may not be realized exactly. Clearly, these variations have 
to be considered in the development of the signal processing algorithms. In other words, 
robustness has to be considered when an algorithm is developed for a sensor array. In this 
chapter, we first develop a number of signal processing algorithms/techniques, assuming 
that all the sensor elements match their assumed, idealized model and also are positioned 
exactly according to a desired geometric topology. The issue of robustness is discussed in 
a later section of the chapter, where some modifications to the basic algorithms/techniques 
that result in robust sensor arrays are also presented. 
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18.1 Narrowband Sensor Arrays 


18.1.1 Array Topology and Parameters 


The simplest and the most considered array geometric topology is the so-called linear 
array. In a linear array, the array elements are positioned on a straight line and often the 
elements are equally spaced. Figure 18.1 presents one such topology with M elements. 
The elements are spaced at a distance of / m. Following the presentations in Chapters 3 
and 6, here too we have chosen to present the array elements as omni-directional antennas. 
However, we note that this is only for convenience and the results presented in this chapter 
are applicable to the variety of applications mentioned at the beginning of this chapter. 
Clearly, other array geometric topologies are also possible. Two examples are presented 
in Figures 18.2 and 18.3. These topologies may be found useful in cases where the space 
for placement of the sensor elements is limited. 

In Figure 18.1, it is assumed that a plane-wave signal is impinging the sensor elements 
from an angle of 6. In this section, we assume that x(n) = a(n) cos(wọn + $) is a nar- 
rowband signal centered around frequency wọ. The fact that x(n) is narrowband implies 
that a(n) varies slowly with the time index n. Moreover, in Figure 18.1, x9(n) through 
X jy —,(”) are phasor representations of the received signal at the array element outputs. As 
discussed in Section 6.10, such phasor signals are recovered at each sensor element output 
and will be available to the sensor array processor, that is, the algorithms that are discussed 
in this chapter. The parameters wọ through wjy_, are a set of complex-valued coefficients 
that we refer to as the array gains. Accordingly, we define the array gain vector 


w=[wġ wi --- wi, JF (18.1) 


ona 


Figure 18.1 Linear geometric topology for sensor arrays. 
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Figure 18.2 Circular geometric topology for sensor arrays. 
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Figure 18.3 Regular grid geometric topology for sensor arrays. 


where the superscript H denotes complex-conjugate transpose or Hermitian. Also, we 
define the array phasor vector (baseband signal vector) 


x(n) = xon) x) e Xy (18.2) 


where the superscript T denotes transpose. 
Let the phasor signal at the sensor element k be written as 


x,(n) = a(n)e!% (18.3) 


Considering the fact that for the case presented in Figure 18.1, the plane-wave signal x(n) 
reaches the element O first and (/ sin@)/c samples later reach the element 1, where c is 
the propagation speed. Similarly, for the rest of the elements, one will find that 


x (18.4) 
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where T is the sampling period. The reader should refer to Section 6.4.4 for details of 
this derivation. Moreover, if we assume that the spacing between the sensor elements, /, 
is equal to one half of the wavelength of the plane-wave, Eq. (18.4) reduces to 


Prk — rı = T sind (18.5) 


Hence, x(n) = a(n)e/%, x)= a(n)e/ Potr sind) |, +, and xy _,(n) =a(n) 
e/(do+(M—l sinb) Using these results, one will find that 


x(n) = X9(n)s(@) (18.6) 
where 
1 
eiT sind 
s(0) = (18.7) 


ej (M—1)x sind 


The vector s(@) is called the steering vector. 
The array output, after applying the gains w,’s, is given by 


y(n) = wix(n) (18.8) 
Substituting Eq. (18.6) in Eq. (18.8), we obtain, 
y(n) = G(@)xo(n) (18.9) 
where 


G(0) = w's(6) 
M-1 
= J wore (18.10) 
i=0 


is called array gain for a narrowband signal arriving at the angle 6. Note that, as xọ(n) = 
a(n)e/*, within a phase error, Po — Q, xo (n) is the same as a(n). Hence, G(n) accurately 
reflects the gain imposed on the incoming signal after passing through the array and 
summed up through the array gains w,’s. In particular, if we let ¢) = ¢, Eq. (18.6) 
simplifies to 

x(n) = a(n)s(O) (18.11) 


18.1.2 Signal subspace, noise subspace, and spectral factorization 


Let a set of L plane-wave signals xo(n), x; (n), ++- , X,_,(m) with the direction angles 6p, 
0i, +++, O,_1, respectively, are received by a linear array as in Figure 18.4. This clearly 
is an extension to Figure 18.1. Recalling Eq. (18.11), the received signal vector at the 
array outputs is given by 


L-1 


x(n) = $ ans (0) + vin) (18.12) 


k=0 
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Figure 18.4 A linear array receiving L plane-wave signals. 


where v(n) is an additive noise vector, and s(6)), S(@,), ++- , (9,1) are the steering 
vectors of the respective signals, viz., 


1 
eiT sin Ox 


S(O) = : , k=0, Loo Lal (18.13) 


e/ (M—-Dx sin ôk 


We assume that the elements of v(n) are a set of zero-mean identically distributed Gaussian 
random variables. Hence, 
Elv(n)v'(n)] = 071 (18.14) 


where o? is the variance of each element of v(n) and I is the identity matrix. 

Using Eqs. (18.12) and (18.14) and assuming that the baseband signals œ,(n), for 
k =0,1,--- , 2—1, are independent of one another, the correlation matrix of the array 
signal vector, R = E[x(n)x"(n)], is obtained as 


R=R,+R, (18.15) 
where 
L-1 
R, = D> P,s()s"(&) (18.16) 
k=0 


is the portion of R arising from the signal components, 
R, =o71 (18.17) 


is the portion of R arising from the noise components, and P, = E[|a, m). 


664 Adaptive Filters 


P» 
Po 


Py 


Power spectral density 
ï 
TN 


Figure 18.5 Power spectral density of the process {x(n)} consisting of a white noise plus a 
multi-tone signal. 


A thorough understanding of an eigenanalysis of the correlation matrix R is essential 
and very helpful to our discussion in the rest of this chapter. To this end, consider the case 
where L = 1 and note that, for this case, R has the same form as the correlation matrix 
R in Example 4.2 (of Chapter 4). Moreover, recall that for the latter, R is the correlation 
matrix associated with a random process {x(n)} whose power spectral density is presented 
in Figure 4.3. For the more general case of Eq. (18.15), one may think of R as the 
correlation matrix associated with a random process {x(n)}, whose power spectral density 
is presented in Figure 18.5, where the tones are located at the frequencies w, = 7 sin 6}, 
for k =0,1,--- , L — 1 and has the respective power levels Py, Pi, +++, Pri 

Next, recalling the minimax theorem that was introduced in Chapter 4 and the related 
discussions in the same chapter, one will find that if L < M and the eigenvalues of R 
are written in the descending order Ag, àj, +++ , Ay_}, the last M — L eigenvalues of R 
are all the same and equal to o7. Alternatively, one may note that R, is a rank L matrix 
of size M x M; hence M — L of its eigenvalues are equal to zero. If we call the nonzero 
eigenvalues of R,, A, 9 through A, ;_,, we will find that the eigenvalues of R are 


as 2 0<i<L-I1 
si Tope NETS (18.18) 
fon L<i<M-Il1 


v’ 


À; = 


Moreover, we note that the set of steering vectors s(0,) span a subspace of the M 
dimensional Euclidean space. We call this subspace the signal subspace, because of 
obvious reasons. Accordingly, the subspace orthogonal to the signal subspace is called 
the noise subspace. 

Another important point that is derived from the above observations is that the first L 
eigenvectors of R are in the signal subspace and, hence, those that are associated with 
the eigenvalues o? reside in the noise subspace. Finally, recalling the unitary similarity 
transformation that was introduced in Chapter 4, R may be expanded as 


R = QAQ" (18.19) 
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where A is the diagonal matrix of the eigenvalues of R, listed in Eq. (18.18), and the 
matrix Q is made-up of a set of unit-length orthogonal eigenvectors of R. Moreover, the 
first L columns of Q expand the signal subspace of R and the rest of the columns of Q 
expand the noise subspace of R. Also, because of reasons that will become clear shortly, 
Eq. (18.19) is often referred to as spectral factorization. 


18.1.3 Direction of Arrival Estimation 


Given the observed samples of the phasor signals in Figure 18.4, that is, the sample 
vectors x(n) for a range of n, we wish to find the DOA of plane-wave signals xọ(n), 
x(n), +++, X,;_,;(). When stated in the context of the presentation in Figure 18.5, the 
goal is to find the location of the spectral components. This, clearly, can be thought of 
as a spectral estimation problem. In the context of this chapter, this may be referred to 
as a spatial spectral estimation problem. The challenge here is that each observed sample 
of x(n) has a limited length equal to the array size. This limited, and possibly very 
short, length of x(n) reduces the resolution of a conventional spectral estimator, if such 
estimators are to be used. Hence, special signal analysis schemes that are tuned to the 
structure of the observed vector x(n), hence, expectedly, will improve the accuracy of the 
results, should be adopted. In the sequel, we introduce a number of such signal analysis 
schemes/algorithms. These algorithms all start with an estimate of R or its inverse. Such 
estimates can be obtained by averaging over a number of samples x(n) in an interval 
ny <n < M, Viz., 


= x(n)x" (n) 


ny=n;+1 


R= (18.20) 
When the DOAs are changing over time and one wishes to track such changes, the 
estimate of R can be updates using the recursive update equation 


R(n +1) = AR(n) + (1 — à)x(n)x” (n) (18.21) 


where À is a forgetting factor, a number close to but smaller than one. Also, when needed, 
the estimate of R~! can be updated using the procedure that was developed in Chapter 12, 
using the matrix inversion lemma. 

With this background, we are now ready to present the details of a number of DOA 
algorithms. A reader that may be interested in a wider variety of the DOA algorithms 
may refer to, for example, Godara (1997). 


Bartlett Method 


The most straightforward implementation of this class of algorithms is to calculate the 
spatial spectrum of x(n), simply, by finding the inner product of the samples x(n) with the 
steering vector s(@), squaring the result and averaging over the range n) < n < no, Viz., 


yn |s"@)x()|? 


S(O) = 
) Ny—ny +1 


(18.22) 


This procedure is called Bartlett method, a name that has been borrowed from the 
spectral estimation literature. Note that the spatial spectrum S(@) is a function of DOA, 
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the angle 6. Recalling Eq. (18.20), Eq. (18.22) can be rearranged as 
S(0) =s"(@)Rs(@) (18.23) 


The DOAs are estimated by presenting the spatial spectrum S(@) as a function of 0 and 
finding the peaks of it. 

Looking at this algorithm in the context of signal analysis and classical spectral esti- 
mation, it is equivalent of passing the underlying signal through a bank of filters that are 
built based on a prototype filter with a rectangular impulse response. The signal powers 
at the outputs of the filter bank are taken as an estimate of the power spectral density of 
the input signal to the filter bank. A plot of the magnitude response of the prototype filter 
(which makes the zeroth subband of the filter bank) and a modulated version of it (which 
makes another subband of the filter bank) are presented in Figure 18.6. Considering these 
plots, one may note that this direct signal analysis suffers from two major problems: 


e The relatively large side lobes of the prototype filter magnitude response results in a 
leakage of the signal power of each tone to other portions of the frequency band. In the 
context of DOA estimation, this results in some interference among the signals arriving 
from different directions, hence, some error/bias in the DOA estimations. 

e For a linear array, the width of the main lobe of the prototype filter response is pro- 
portional to the inverse of the length of the array. Hence, unless the array size is large, 
this method suffers from a low resolution in distinguishing between a pair of sources 
with close DOAs. 


To clarify these points, we consider a ten-element array that impinges with three plane- 
wave signals from the angles 6) = 25 , 0, = 28 and 6, = —30 . We assume that P) = 
P, = P, = 1, and o? = 0.1. Taking 100 randomly and independently generated samples 
of x(n), R is calculated according to Eq. (18.20) and the result is substituted in Eq. (18.23) 


Figure 18.6 Magnitude response of subband filters in a filter bank system with Bartlett prototype 
filter. (a) Prototype filter (the Oth subband). (b) An arbitrary subband. 
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Figure 18.7 Spatial spectrum S(@) according to the Bartlett method, for a linear array of size 
10 elements, when impinges with three equal power plane-wave signals from the angles 6) = 25 , 
6, =28 and 6, = —30 . The signals are at 20 dB above the noise level. 


to evaluate the spatial spectrum S(0). The result is presented in Figure 18.7. As observed, 
S(@) has a clear maximum at 0 = 6,; however, the responses arising from the DOAs 6 
and 6, have overlapped with each other and thus may be confused with a single DOA at 
the mid-point between 6, and 6). 


MVDR Algorithm 


MVDR, or Minimum Variance Distortionless Response, is an elegant DOA estimation 
method that resolves the low resolution problem of the Bartlett method by taking the 
following approach. For each angle of arrival, 6, an optimum tap-weight vector w,(@) 
that has a gain of unity in the look direction (angle 0) and minimizes the signal power 
from all other directions is found and the spatial spectrum 


S(0) = wi @)Rw! (6) (18.24) 


is calculated accordingly. Subsequently, the DOAs are estimated by locating the peaks of 
S(@). 

To find S(@) here, we may take the following steps. We wish to solve the following 
constraint optimization problem. 


Find w that minimizes the quadratic function w' Rw, subject to the constraint s4(@)w = 1. 


This problem can be easily solved by using the method of Lagrange multipliers and taking 
the steps presented in Section 6.11. It leads to the solution 


R-'s(6) 


w(8) = ——.——_— 
s4(@)R—!s(@) 


(18.25) 
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Figure 18.8 Spatial spectrum S(@) according to the MVDR algorithm, for a linear array of size 
10 elements, when impinged with three equal power plane-wave signals from the angles 6) = 25°, 
0 = 28°, and 0, = —30 . The signals are at 20 dB above the noise level. 


Substituting Eq. (18.25) into Eq. (18.24), we obtain 


S(O) (18.26) 


sH(0)R-!s(0) 

Figure 18.8 presents the result of applying the MVDR algorithm to finding the DOA 
of sources for the case that was introduced above. Comparing this result with the one 
presented in Figure 18.7, one can see that the MVDR algorithm performs significantly 
better than the Bartlett method. It has a surprisingly high resolution and identifies all DOAs 
accurately. Nevertheless, we should note that MVDR algorithm may not perform as well 
when the impinging signals are comparable with the noise level. The MUSIC algorithm 
that is introduced in the following section solves this problem to a great degree. 


MUSIC Algorithm 


MUSIC, or MUltiple SIgnal Classification, is another elegant algorithm that can also 
achieve high resolution DOA estimation. Moreover, it outperforms MVDR, particularly 
in more noisy environments. As we will find, excellent performance of MUSIC algorithm 
is attributed to the fact that it makes almost maximum use of the structure of the sampled 
vector x(n) in estimating the DOAs. 

Recalling the signal structure x(n) in Eq. (18.12) and the spectral factorization 
(18.19), MUSIC algorithm is implemented by taking the following steps. 


1. Given the samples of x(n), calculate the R according to Eq. (18.20). 
2. Find the eigenvalues ào, 4,, ++- , Aņy—1ı Of R and the respective eigenvectors qo, q1, 
*, Qu—1ı: Also assume that these are sorted such that Ag > A; > +- > àmi- 
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3. If the number of plane-wave signals, L, impinging the array are known, form the 
M x (M — L) noise subspace matrix Q, that consists of qy_, through qy_ at its 
columns. 

If L is unknown, find an estimate of it by identifying the last eigenvalues of R that 
are approximately equal. The number of these eigenvalues is equal to M — L. Once 
this is found, form the noise subspace matrix Q,. 

4. Evaluate the spatial spectrum 


1 
S(@) = —_. 18.27 
m IQ}s(8)|? l i 


5. The L largest peaks S(@) are the estimates of DOAs. 


To explain how MUSIC algorithm makes maximum use of the sampled vectors x(n), 
we first note that any steering vector s(@) that matches one of the wave-plane signals 
will be orthogonal to the noise subspace. Hence, it will be orthogonal to the columns of 
Q,, and, thus, |Q4s(6)||* will be a small quantity. This in turn implies that the spatial 
spectrum (18.27) will have a large amplitude for any steering vector that falls within the 
signal subspace. 

Figure 18.9 presents the result of applying the MUSIC algorithm to finding the DOA 
of sources for the case that was introduced above and tested with Bartlett and MVDR 
algorithm. We previously found that MVDR algorithm significantly performs better than 
Bartlett algorithm. Comparing the results of Figure 18.8 and Figure 18.9, it may appear 
that MUSIC algorithm outperforms MVDR algorithm only by a small margin. However, 
if the noise variance is increased, the difference between the two algorithms will become 
more obvious. The result of one such experiment in which ø? = 0.01 is replaced by the 
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Figure 18.9 Spatial spectrum S(0) according to the MUSIC algorithm, for a linear array of size 
10 elements, when impinged with three equal power plane-wave signals from the angles 6) = 25 , 
0, = 28° and b, = —30 . The signals are at 20 dB above the noise level. 
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Figure 18.10 Comparison of the MVDR and MUSIC algorithms in a high noise environment. 


increased value o? = 0.1 is presented in Figure 18.10. The difference is obvious and 
significantly in favor of the MUSIC algorithm. In fact, in this case, the MVDR algorithm 
fails to clearly distinguish 6, and 6). 


18.1.4 Beamforming Methods 


Consider the linear array of Figure 18.1. As discussed above, when this array is impinged 
by a plane-wave from the direction 6, it will have a gain G(@) that is given by Eq. (18.10). 
A plot of |G(@)| as a function of the angle of arrival 6 is called array or beam pattern. 
The problem of beamforming is that of selecting the tap-weight vector w that results in 
a beam pattern with some desired characteristics. 


Conventional Beamforming 
In the conventional beamforming, tuned to a look direction 6), we simply set 


1 
= —s(9 18.28 
w ma 0) ( ) 
where s(@) is the steering vector, defined in Eq. (18.7). Substituting Eq. (18.28) in 
Eq. (18.10), we obtain 


1 Saja : 
G(e) = z > eji” (sin0—sin 69) (18.29) 


Example 18.1 


Figure 18.11 presents a plot of the beam pattern of Eq. (18.29), when M = 10 and 
0 = 45 . As expected, the beam pattern is directed towards the look direction 6 = 09. 
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Figure 18.11 Beam pattern of a conventional beamformer, for a ten element linear array. 


It has a gain of unity in this direction, and its gain is smaller than unity in other directions. 
Moreover, as Eq. (18.29) implies, here, the beam pattern is symmetric with respect to 
the angle 90° and, hence, it also sees a plane-wave that may be arriving from the angle 
90° + 45° = 135°. 


The conventional beamforming satisfies certain optimality, and thus its derivations 
with that respect may prove instructive. Let us assume that the array is impinged with a 
plane-wave with the angle of arrival 6) and there is no other impinging signal. However, 
there is an additive noise with variance o? at each element of the array. For this setup, 
the beamformer tap-weight vector w can be optimally chosen as follows. Find w that 
minimizes the cost function 

E = o?w"w (18.30) 


subject to the constraint 
w''s(4) = 1 (18.31) 


This constraint minimization will make perfect sense, if we note that the cost function € is 
the noise variance at the beamformer output, and the constraint (18.31) allows the plane- 
wave signal to go through the beamformer with a gain of unity. This clearly maximizes 
the signal-to-noise ratio at the beamformer output and, thus, is optimum in that sense. 
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Solution to the above constraint minimization can be easily found by using the method 
of the Lagrange multipliers, following the same procedure as the one in Section 6.11. 
Such procedure leads to Eq. (18.28). 


Null-Steering Beamforming 


The conventional beamformer selects a beam pattern with the gain of unity in the desired 
look direction, without any concern of other possible impinging signals to the array. 
Although, as demonstrated above, the conventional beamformer is optimum when the 
desired signal is the only impinging plane-wave to the array, it may behave poorly in 
the presence of other impinging signals. Hence, more elegant choices of w should be 
searched for. 

A trivial solution is to select w so that nulls are introduced in the direction of interfering 
signals, while the array gain is constrained to unity in the desired look direction. This 
is mathematically formulated as follows. If 69 is the desired look direction, and 64, 65, 
--+, 6, are the directions of the interference signals, w has to be selected to satisfy the 
following equations simultaneously: 


w''s(4) = 1 (18.32) 
and 
w's(6,) =0, for k=1,2,---,L (18.33) 
Rearranging the left-hand sides of these equations in the form of s%(6,)w, for k = 
0,1,--- , L, and combining the results, we obtain 
S'w =e, (18.34) 
where i | | 
S=1]s(@) s(@,) +- s(@,) (18.35) 
| | | 
is a matrix of size M x (L + 1), and eẹọ = [1 00 --. 0]? is a column vector of size 
Lai; 


When L + 1 > M, and the columns of S are linearly independent, Eq. (18.34) is overde- 
termined and, hence, it has no solution. When L + 1 = M, and the columns of S are 
linearly independent, the solution to Eq. (18.34) is 


w = (S) leg (18.36) 


Recalling the definition of ey, here, w is equal to the first column of the inverse of SĦ. 

The more common case is when L+ 1 < M. In this case, Eq. (18.34) is underde- 
termined and thus has no unique solution. To limit the solution to a unique one, it 
is common to reformulate it as follows. We find w that minimizes the cost function 
(18.30), subject to the constraints specified by Eqs. (18.32) and (18.33). This clearly min- 
imizes the noise variance at the beamformer output but produces nulls in the direction of 
the undesired plane-waves impinging the array and lets the desired signal to pass through 
the array with the gain of unity. 
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This constraint minimization can be easily solved by using the method of Lagrange 
multipliers. Because of multiple constraints, here we need an extension to the procedure 
that was presented in Section 6.11. We define 


E: =o? wiw + (Sw — ep)"A (18.37) 
where À = [Ay A, ++ A,]' is a vector of Lagrange multipliers. The optimum value of 


w, which we call w,, (where ‘ns’ stands for null-steering), is obtained by simultaneous 
solution of 


VSE? = 207W +28 = 0 (18.38) 
and 
Vie = (S"w Wns — €p)* =0 (18.39) 
From Eq. (18.38), we obtain 
1 
Wns = — -5 SÀ (18.40) 
oy 


Substituting Eq. (18.40) in Eq. (18.39), we get 

à = —0? (SHS) leg (18.41) 
Finally, substituting Eq. (18.41) in Eq. (18.40), we obtain 

Was = S(SUS)~'e, (18.42) 


It is interesting to note that the optimum constraint tap-weight vector w,,, is independent 
of the noise variance, oĉ. 


Example 18.2 


For the example that was studied to evaluate the DOA estimations above, we have gen- 
erated the beam pattern of the array when it is set to receive the plane-wave arriving 
from the angle 20 , while rejecting those arriving from angles 25 and —30 . The beam 
pattern of the designed beamformer is presented in Figure 18.12. Careful examination of 
this figure reveals that, as one would expect, the array gain in the look direction 20° is 
unity, and it is zero in the directions 25° and —30 (note that —30° = 330 ). 


Optimal Beamforming 


In a null-steering beamformer, the design is performed to minimize the noise variance at 
the array output, while perfect nulls are introduced in the directions of undesired plane- 
waves. One may argue that this may not be a good design. The optimum design is the 
one that balances between the minimization of noise and the residual interference from 
the plane-wave signals that are impinging the array from the directions different from the 
look direction. This optimum design, which is often referred to as Capon beamforming 
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Figure 18.12 Beam pattern of the null-steering beamformer, for a ten element linear array. It is 
designed for the look direction 20°. There are interferers in the directions 25° and —30 . 


(Capon, 1969), is carried out by minimizing the cost function 


é = Elly@)/"] 


= wiRw subject to the constraint w''s(6) =]; (18.43) 


Recall that y(n) is the array output, and R is the correlation matrix of the signal samples 
at the array inputs. Also, recall the structure of R as presented in Eq. (18.15). 
The constraint minimization (18.43), also, can be solved using the method of Lagrange 
multipliers. The result is 
R7!s(6) 
c 
wo = HOR Ts0) (18.44) 
The optimum/Capon beamformer is also called minimum variance distortionless 
response (MVDR), as it delivers a distortionless copy of the signal coming from the 
look direction while minimizing the variance of the summation of noise and any residual 
from interfering signals coming from other directions. Since the level of the desired 
signal is kept while variance of noise plus interference is minimized, one may say that 
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the optimum beamformer maximizes the signal-to-interference-plus-noise ratio (SINR) 
at its output. 


Example 18.3 


A beam pattern of the optimum beamformer for the three plane-wave case that has been 
considered in our previous studies in this chapter, is presented in Figure 18.13, for the 
case where o = 1. The result is very similar to the one in Figure 18.12. The difference 
between the two beam patterns is minor and can be seen only by presenting the two 
plots in a cartesian coordinate. These plots are presented in Figure 18.14. As noted, the 
optimum beamformer reduces the gain of the array for most of the values of 6. This 
reduces the noise variance at the array output. However, the reduced noise at the array 
output is at the cost of a slight shift of the null at 0 = 25. The choice of w, by design is 
to balance between joint suppression of noise and interference, caused by the undesirable 
plane-waves. 


Further experiments reveal the obvious fact that as o? decreases, the optimum beam- 
former approaches the null-steering beamformer. On the other hand, when the undesirable 
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Figure 18.13 Beam pattern of the optimum beamformer for a ten element linear array. It is 
designed for the look direction 20 . There are interferers in the directions 25 and —30 and there 
is an additive noise at each element with variance oĉ = 1. 
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Figure 18.14 Beam patterns of the null-steering and optimum beamformer in cartesian coordinates 
for a ten element linear array. It is designed for the look direction 20 . There are interferers in the 
directions 25 and —30 and there is an additive noise at each element with variance o? =1. 


plane-waves have significantly lower amplitudes compared to the noise variance, the opti- 
mum beamformer approaches the conventional beamformer. 

In comparing the optimum beamformer with the null-steering beamformer, we note that 
while the latter requires us to know the directions of the desired and interfering signals, the 
former needs us to know only the direction of the desired signal. This makes the design 
and implementation of an adaptive filter for adjustment of the optimum beamformer 
coefficients a straightforward task. One can simply use the constrained LMS algorithm 
that was developed in Section 6.11, of Chapter 6. 

The above observation seems to imply that the optimum beamformer is superior to 
the null-steering beamformer both with respect to performance and implementation. This 
statement, unfortunately, remains true only if the look direction is known perfectly or, 
at least, within a very good precision. If the assumed look direction is not very close 
to the angle of arrival of the desired user, the optimum beamformer will interpret the 
desired user as another interference and will attempt to remove it from the output of the 
array. The null-steering method, on the other hand, generates a main lobe around the look 
direction and thus the generated beam pattern can tolerate some variation of the angle of 
arrival of the desired plane-wave. In other words, the null-steering method is more robust 
than the optimum beamforming; however, it requires more information and may be more 
complex to implement. 


Beam-Space Beamforming 


The beamforming techniques that have been proposed so far are collectively referred to 
as element-space beamforming/processing, as they generate an array output by combining 
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the signals directly taken from the array elements. Beam-space beamforming, in contrast, 
as its name implies, combines signals from a number of already-formed beams (beamed 
signals) to obtain the array output. 

Figure 18.15 presents a block diagram of the beam-space beamformer. The signals from 
the array elements are combined together according to the conventional beamforming 
method first to enhance the signal impinging the array from the look direction (angle of 
arrival of @)). We call this the primary beamformer. It has the fixed coefficient vector 
p = S(@)). The auxiliary beam generator block, also called blocking beamformer, contains 
L — 1 beamformers that all have a null to remove the desired signal while enhancing the 
signals impinging the array from the other directions. The coefficient vectors of these 
beamformers are the columns of a matrix B that we refer to as blocking matrix. The 
beamed signals Fi (n) through J= AC) are thus generated as 


y(n) = Bx(n) (18.45) 


where x(n) = [x9(n) x(n) = x0) and yn) = [y 0) --- y, m]. The 
beamed signals are then passed through an adaptive linear combiner, with coefficients 
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beamformer 
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beamformer 


Figure 18.15 Beam-space beamformer. 
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w,(n) through w;_,(”), whose output is compared with y (n) and its tap weights are 
adapted to minimize the mean-squared value of the sum of undesirable signals and 
noise at the array output y(n). LMS or any other adaptive algorithm may be used for 
adaptation of the coefficients w; (n) through w,_(n). 

The following criteria are considered to set the columns of the blocking matrix B: 


e Columns of B must be orthogonal to the primary beamformer coefficient vector p. This 
is to avoid the presence of any signal from the desired look direction in the beamed 
signals Xi (n) through y DANE COE 

e Columns ‘of B should be orthogonal with one another. This reduces the correlation 
between the beamed signals yi (n) through X 10, and hence, improves the conver- 
gence behavior of the underlying adaptive linear combiner. 


Various methods can be adopted to satisfy these criteria. Here, we propose one method 
that benefits from the eigenanalysis results that were presented in Chapter 4. 

Assuming that the directions of the desired signal and other plane-wave signals that 
impinge the array are known and given by the steering vectors s(@)), s(81), --- , $(@,_1), 
we construct the correlation matrix 


L-1 


R = Ks(0)s" (8o) + $` 8(0,)s"(0,) + oy 1 (18.46) 
l=1 


By choosing the coefficient K to a large value, the first eigenvector of R that corresponds 
to its largest eigenvalue will approach s(@)) as K increases. This follows from the minimax 
theorem that was introduced in Chapter 4. We let the primary beamformer coefficient 
vector p be equal to this first eigenvector of R. Any subset of size L — 1 of the rest of the 
eigenvectors of R, as columns of B, satisfies the two criteria that were mentioned above. 
Furthermore, to make sure that the plane-wave signals from the directions 0; through 
6,_, will be present in the beamed signals y (n) through Jy (0, one should choose 
the second to Lth eigenvectors of R as the columns of the | blocking matrix B. This is 
also a consequence of the properties that are derived from the minimax theorem. Further 
exploration related to this procedure of setting the columns of B is deferred, and is left 
as an exercise at the end of this chapter. 


18.2 Broadband Sensor Arrays 


Broadband processing of sensor array signals are built based on the same principles as the 
narrowband processing techniques that have been presented so far in this chapter. To find 
the DOAs of multiple broadband signals, the signal at each element is decomposed into a 
set of narrowband signals, through a bank of filters. The narrowband signals from the same 
subband of all the array elements are then taken as a set, and any of the DOA methods that 
were presented in Section 18.1.3 may be used to find the DOA of the various impinging 
signals to the array. Moreover, the results from different subbands may be averaged to 
improve on the accuracy of the estimated DOAs. For the rest of this section, we assume 
that the necessary DOAs have already been obtained and thus concentrate on a number 
of beamforming methods. 
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18.2.1 Steering 


Recall that in narrowband beamformers, the demodulated signals at the array elements 
could be represented by phasors. The associated phasors to a plane-wave signal in an 
array are differentiated by their phase angles that are determined by the array topology 
and the DOA of the signal. For instance, for the linear array presented in Figure 18.1, 
these phase angles are quantified by the steering vector s(@) of Eq. (18.7). We also recall 
that the phase angles that are reflected in s(@) are to realize the delays between arrival 
time of the plane-wave to the various elements of the array. 

For broadband signals, unfortunately, the phasor representation of the signals is not 
applicable; hence, the delays cannot be realized through a simple steering vector. Steer- 
ing of broadband signals is performed by introducing time delays at the array elements 
and selecting these delays such that the desired signal components impinging the array 
elements are time-aligned after at the delay blocks outputs. This concept is presented in 
Figure 18.16. For a signal x(n) that impinges the array from an angle 0, its replicas at 
the output of the delay blocks will be time-aligned, when 

kl sin@ 


%=——, for k=0,l, +, M1 (18.47) 


where c is the propagation speed. Signals impinging the array from the other angles will 
not be time-aligned. Hence, when Tọ through t,_,; are set according to Eq. (18.47), x(n) 
will be enhanced at the array output, y(n), while signals impinging the array from other 
directions will not. This will result in a beam pattern that has a main lobe for DOA of 0. 

Since the processing of array signals is usually performed digitally, demodulator fol- 
lowed by analog-to-digital converter (ADC) circuitries are inserted at the antennas’ output. 
The sampled signals are passed through a bank of shift-registers from which the delayed 


Figure 18.16 Steering in broadband beamformer. 
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signals are tapped out. Interpolation may also be performed over the samples from the 
shift-registers to realize the desired delays with a sufficient accuracy. Such an imple- 
mentation also allows one to obtain multiple signal samples from each shift-register, and 
hence, obtain signal samples for more than one steering direction. 


18.2.2 Beamforming Methods 


Using the digital delay implementation that was introduced above, signal samples for dif- 
ferent steering directions can be easily obtained. In the rest of this section, we present the 
extensions of the conventional and optimum narrowband beamformers that were intro- 
duced in Section 18.1.4, to broadband signals. Direct extensions of null-steering and 
beam-space beamformers to broadband signals are not possible. However, they may be 
implemented if a broadband signal is partitioned to a sufficient number of narrowband 
signals, and each narrowband signal is treated as a phasor. 


Conventional Beamforming 


For narrowband signals, the conventional beamforming was established by multiplying 
the phasor signals x(n) through x ,_,(n) by the conjugate of the element of the steering 
vector, s(@). This is equivalent to time-alining the signals received by the array elements 
for the desired look direction. Applying the same concept to broadband signals, one will 
find that a conventional beamformer is implemented, simply by adding a scaling factor 
a at the output of Figure 18.16. This is presented in Figure 18.17. 


Figure 18.17 Conventional beamformer for broadband signals. 
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Optimal Beamforming 


For broadband signals, an optimal beamformer can be implemented according to the 
structure presented in Figure 18.18. This structure, which was originally proposed by 
Frost (1972), has been explored and extended by many other researchers as well, for 
example, Er and Cantoni (1983) and Buckley and Griffiths (1986). 

In Figure 18.18, Wo(z) through W,y_;(z) are a set of adaptive FIR filters, each of 
length N. The signal of interest, xọ(n), impinging the array from the look direction 6, 
is time-aligned at the output of the delay blocks Tọ through t,,_,. This, in turn, implies 
that for the desired signal, the array is characterized by the transfer function 


N-1 
C(2) = >) wo (18.48) 
i=0 


A trivial choice for C(z) may be C(z) = z~4, where A is a fixed delay. In this case, if 


Wo(z) through W,,_,(z) are constrained to satisfy Eq. (18.48) and C(z) = z4, a delay 
replica of xg(n) will appear at the array output. Alternatively, C(z) may be selected to 
apply some filtering to x(n), for example, to reduce any out-of-band additive noise that 
the channel has added to the received signal. Here, we denote such a choice of C(z) by 
Co(z) and adapt Wo(z) through W,,_,(z) to minimize the cost function 


§ = Elly@)’] (18.49) 
subject to the constraint 


N-1 
>> We) = Cole) (18.50) 
i=0 


Figure 18.18 A structure for an optimal broadband beamformer. 
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The solution to this constrained minimization can be derived as follows. Let 


w=([wi wi ... wi JF (18.51) 
where w; = [wio Wj, © tepals fori =0,1,--- , M — 1, is the tap-weight vector 
of W;(z). Also, we use the vector €o = [coo Co, *** Co, v_11# to denote the tap-weight 


vector associated with Cp(z). Moreover, we define x(n) as the vector of tap values asso- 
ciated with w. Accordingly, 


y(n) = włx(n) (18.52) 
and the constraint (18.50) may be written as 
Cw = (18.53) 


where 
C=[I1--- I] (18.54) 


and I is the identity matrix of size N, and there are M identity matrix in C. Substituting 
Eq. (18.52) in Eq. (18.49), we obtain 


—é=w'Rw (18.55) 
where R = E[x(n)x#(n)]. 
To minimize the cost function € subject to the constraint (18.53), we use the method 
of Lagrange multipliers and accordingly define 


E° = w"Rw + (Cw — co)" (18.56) 


where à is a vector of Lagrange multipliers. Following the same derivations to those that 
led to Eq. (18.42), here we obtain 


wS = R ICT(CR ICD) teo (18.57) 

As in the case of constrained LMS algorithm that was developed in Chapter 6 and was 

presented in Table 6.6, here also a similar LMS algorithm can be developed. This is left 
as an exercise problem at the end of this chapter. The resulting recursions will be 


wt (n) = w(n) — 2uy* (n)x(n) (18.58) 


and 
w(n + 1) = wt (n) + Tche — Cwt (n)) (18.59) 


We refer to these update equations as LMS-Frost algorithm. 
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Subband Beamforming 


A broadband signal may be partitioned into a number of narrowband signals through an 
analysis filter bank (AFB). The subband signals may then be processed to match a certain 
target. The processed subband signals are subsequently combined together to reconstruct 
the processed broadband desired signal. This concept was studied thoroughly and the 
respective adaptive filters were developed in Chapter 9. 

In the context of beamformers, to benefit from the narrowband beamforming techniques 
that were developed in Section 18.1, the number of subbands should be selected suffi- 
ciently large so that each subcarrier signal can be modeled by a flat spectrum over the 
band that is nonzero, and hence the phasor representation will be applicable. Once this is 
established, any of the narrowband beamforming techniques can be applied to each set of 
subcarrier signals separately. Figure 18.19 presents the structure of subband beamformer. 
The signal from each antenna is passed through an AFB, and the same indexed subband 
signals from the AFBs are passed to a narrowband beamformer. The outputs from the 
narrowband beamformers are combined together using a synthesis filter bank (SFB). 


18.3 Robust Beamforming 


So far, in this chapter, we have made the idealized assumption that the antenna elements 
are placed at their expected positions precisely, for example, in a linear array, and the 
antenna elements are placed along a straight line and are equally spaced at a distance 
of one half of the wavelength of the plane-wave impinging the array. As noted at the 


kth subband 


m signals —] 


AFB 


narrowband 


beamformer 


Figure 18.19 Subband beamforming. 
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beginning of this chapter, these idealized assumptions are usually not true. In the context 
of narrowband beamformers, the assumption that the steering vector of a plane-wave 
impinging the array is given by Eq. (18.7) will become inaccurate. The following example 
provides some insight into the impact of such wrong assumption. 


Example 18.4 


Consider a 10-element linear array that has been designed for element spacing of one 
half of the wavelength of impinging signals. However, owing to an implementation error, 
the spacing between the antennas turned out to be 5% shorter than the desired length. 
Study the generated optimum array pattern for the look direction of 45°, assuming the 
steering vector is calculated based on the assumption that the antennas spacing is that of 
the original design. Also, assume that in addition to the desired plane-wave signal, there 
are two interfering signals with powers of P) = P, = 1 impinging the array from angles 
75 and —30 . The desired signal has the power of P) = 4. There is also a background 
white noise at each array element with variance ø? = 0.01. 


Solution: The steering vector for the look direction, calculated based on the assumption 
that the element spacing is one half of the wavelength, is calculated as 


1 
eiT sin45° 
Sa (%0) = l (18.60) 


ot sin45° 


The steering vectors for the interfering signals as well as that for the desired signal, for 
the element spacing of 95% of one half of the wavelength, on the other, are 


1 
e/ 0.951 sind 


s(0) = , for 0=45 ,75 and — 30 (18.61) 


@/0.95x 9x sind 


Using these, we calculate the correlation matrix 


2 
R= J Ps()s"() + oI 
k=0 
and use the result to find the optimum coefficients of the optimum beamformer according 
to Eq. (18.44) with s(@9) substituted with s,(@)). Subsequently, the array gain is obtained 
using Eq. (18.10) with w replaced by w§ and for s(@) given by Eq. (18.61). 

For the numerical values given in this example, Figure 18.20 presents the beam pattern 
of the implemented array. As seen, the design generates an undesirable null in the look 
direction 6) = 45°, meaning that the desired signal will be blocked by the array. 

To explain this behavior of the array and develop further insight into the problem, 
we note that the steering vector s,(0)) given by Eq. (18.60) for the implemented array 
corresponds to the look direction 6}, which is obtained by solving the equation 


0.957 sin 04 = m sin 45 
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Figure 18.20 Beam pattern of the implemented array of Example 18.4 with inaccurate knowledge 
of the elements spacing. 


This gives 0) = 48.1 and this, in turn, means, by design, the designed array should have 
a gain of one at the angle 48.1. Careful examination of Figure 18.20 reveals that this 
indeed is the case. 

One more observation that is developed by exploring Figure 18.20 is that to block the 
signal arriving from the angle 45°, while assuring a gain of unity in the direction 48.1, 
the generated beam pattern has developed relatively large lobes. These large side lobes 
results in some noise amplification at the array output. Hence, the array output suffers 
from both cancellation of the desired signal as well as an increase in the background 
noise. These are of course undesirable and measures to avoid them should be taken. 

To further see the deviation of the faulty design of Figure 18.20 from an optimum 
design, Figure 18.21 presents the generated beam pattern when the exact spacing of the 
array elements are taken into account in forming the steering vector for the look direction. 
Comparing this beam pattern with the one in Figure 18.20, one will observe that a perfect 
main lobe is generated in the look direction, and there is no significant lobes in other 
directions. That is, only the desired signal is allowed to pass through the array. The plane 
wave signals from other directions and noise are greatly suppressed. 

The above example clearly shows that a small deviation of the installed array from 
its original specifications can result in a serious failure of the optimum beamformer. 
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Figure 18.21 Beam pattern of the implemented array of Example 18.4 with accurate knowledge 
of the elements spacing. 


One immediate solution to avoid such failure is to impose additional restrictions when 
designing array tap weights w;. For instance, some researchers have proposed to add addi- 
tional constraints to flatten the array gain around the look direction. This, for instance, 
is achieved by restricting the derivatives of the array gain to remain close to zero 
around the look direction, for example, (Er and Cantoni, 1983; Buckley and Griffiths, 
1986), and (Stoica, Wang, and Li, 2003). Others have introduced methods that take 
into account some uncertainty in the steering vector that defines the look direction and 
accordingly optimize the array coefficient vector w. Here, to develop some insight into 
the operation of the class of robust beamformers, we first introduce the concept of soft 
constraint and study the resulting solutions. This paves the way for an in-depth under- 
standing of a few robust beamforming methods from the literature that will be presented 
subsequently. 


18.3.1 Soft-Constraint Minimization 


Recall that in the narrowband optimum beamformer, the beamformer output power is min- 
imized subject to the constraint of array unity gain in the look direction. Mathematically, 
this is formulated as in Eq. (18.43) and leads to the solution (18.44). 
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One can also arrive at the same solution by minimizing the amended cost function 
E = w'Rw + K|w''s(@)) — 1|? (18.62) 


and finding the result as K — oo. This result can be easily proved as follows. 
We first note that Eq. (18.62) can be expanded as 


E = wi (R + Ks(4%)s4(0))w — Kw's(4)) — Ks" (6))w + K (18.63) 


Next, letting the gradient of € with respect to w equal to zero and solving for w, we 
obtain 
w = (R + Ks(@)s(@)) | Ks(O9) (18.64) 


Using the matrix inversion lemma (that was introduced in Chapter 6), one finds that 


KR™'s())s#'(6,)R7! 


R + Ks()s"(6))) | = R7! 18.65 
(R + Ks(@)s"(6)) RR (18.65) 
Substituting Eq. (18.65) in Eq. (18.64) and simplifying the result, we get 
KR !s(6) 
= 18.66 
Y = T+ Ksi(@)R-'s@) Cee) 
Finally, as K — oo, Eq. (18.66) simplifies to 
R7's(6, 
"= so) (18.67) 


~ SH(0)R-!s(6o) 


This, clearly, has the same form as Eq. (18.44). 

Alternatively, one can find the coefficient vector w that minimizes € for some positive 
large value of K. This solution may be thought as the one that minimizes the cost 
function £ = w4Rw subject to w4s(6,) — 1 ~ 0. Hence, this approach may be referred 
to soft-constraint minimization. 

An advantage of soft-constraint minimization is that it allows one to easily extend it 
to multiple constraints, and still solves the problem straightforwardly. For instance, to 
broaden the main lobe of beam pattern in the look direct, one may choose to minimize 
the cost function 

E =w'Rw+ K X |w''s(6) + 6,) — 1)? (18.68) 
k 


where ô, are a set of perturbation angles around zero that are added to make sure that the 
main lobe of the beam pattern remains close to one around the look direction 6). 


Example 18.5 


Consider the beamforming scenario that was discussed in Example 18.4. To avoid intro- 

duction of a null in the beam pattern at the angle 6) = 45°, we let the perturbation ô% 

in Eq. (18.68) be a Gaussian random variable with standard deviation o;. Also, we let 

K = 1 and run the summation on the right-hand side of Eq. (18.68) over 10,000 random 

choices of 6,. Also, let each array element be subject to an independent white noise with 
Å: 2 A? 

variance of = 0.01. Study the generated beam pattern foro; = 2 . 
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Solution: The results can be generated using the following MATLAB code. The variable 
names used here are similar to those defined above. 


% Example 18.5 


sigma_d=2; 
sigmanu2=0.01; 
theta0=45*pi/180; 
thetal=75*pi/180; 
theta2=-30*pi/180; 
so=exp(1i*pi*sin(theta0)*[0:M-1]’); 
s0=exp(1i*0.95*pi*sin(theta0)*[0:M-1]’); 
sl=exp(1i*0.95*pi*sin(thetal)*[0:M-1]’); 
s2=exp(1i*0.95*pi*sin(theta2)*[0:M-1]’); 
R=P0*s0*s0’+P1*s1*s1’+P2*s2*s2'+sigmanu2*eye (M); 
p=0; 
for k= 1:10000 
theta=(45+sigma_d*randn) *pi/180; 
so=exp(li*pi*sin(theta)*[0:M-1]’); 
R=R+so*so’; 
p=p+so; 


end 
woc=R\p; 
for k=0:360 
theta=k*pi/180; 
G(k+1) =woc’ *exp (13*0.95*pi* [0:M-1]’*sin(theta) ); 
end 
theta=[0:360]*pi/180; 
figure(1),polar(theta,abs(G) ) 
figure(2),plot(theta*180/pi,abs(G) ) 


The results are presented in Figure 18.22. Note that an array gain of around one is 
generated over the angles of 35 to 60, allowing a robust performance for some variation 
of the position of the array elements. There are also nulls in the direction of the interference 
signals impinging the array from the directions 6, = 75 and 6, = —30 . Nevertheless, 
the array gain has increased significantly at other directions. This, as discussed before, 
may result in a significant noise enhancement at the array output. 


18.3.2 Diagonal Loading Method 


In Example 18.5, we observed that to keep the array gain flat around the desired look 
direction, it may increase significantly in other directions. We may also recall that, in 
Example 18.4 (Figure 18.20), to keep the array gain in the wrongly set look direction 
equal to one, while to generate a null in the correct look direction (arising from the 
erroneous set-up of the array), large gains appear in the beam pattern. These problems 
are resolved and a robust/more robust array is generated if a regularization term yw"w 
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Figure 18.22 Beam pattern of implemented array of Example 18.5. 


is added to the cost functions € in Eqs. (18.43), (18.68), or other similar equations. This 
is equivalent to artificially increasing the background noise in the array. This results in a 
significantly improved beam pattern. The following examples clarify this point. 


Example 18.6 


For the case presented in Example 18.5, when diagonal loading is added, w is found by 
minimizing the cost function 


E =w'Rw+ K > _ |w''s(6) + 6,) — 1? + ywhw (18.69) 
k 


Keeping the parameters listed in the MATLAB code in Example 18.5 and finding w 
that minimizes € in Eq. (18.69), we obtain an array whose beam pattern is presented in 
Figure 18.23. Here, the parameter y is set equal to one. As seen, the large beam pattern 
gains have significantly reduced; take note of the amplitude scales. 


Diagonal loading also helps in improving the robustness of other designs. A few prob- 
lems at the end of this chapter provide additional insights. 
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Figure 18.23 Beam pattern of implemented array of Example 18.6. 


18.3.3 Methods Based on Sample Matrix Inversion 


Recall the optimum beamformer design formulation (18.43) and note that the constraint 
w's(6,) = 1 may be also written as 


w'Rw=1 (18.70) 


where R,, = s(9)s"(@). Accordingly, an alternative formulation of the optimum beam- 
former may be presented as 


minw'Rw, subject to the constraint wR „w = 1 (18.71) 
w 


Furthermore, it is instructive to note that the constraint wĦR, „w = 1 is the equivalent 
of choosing w for a unity array gain in the look direction. This constrains the (desired) 
signal power to a fixed value. On the other hand, w'Rw is equal to the signal plus 
interference and noise power at the array output. Moreover, assuming that the desired 
signal is uncorrelated with interference signals and noise, one may decompose R as 
R, + R;, where R, is the part that arises from the signal of interest and R; is the part 


that arises from interference signals and noise. In addition, R, = nR,,, where 7 is some 
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constant related to the level of the signal of interest. Finally, one may define the SINR at 
the array output as 
wi R.w 7 wiR..w 


SINR = = 
wHĦR;w ý wĦR;w 


(18.72) 


It is straightforward to show that SINR can be maximized, equivalently, by maximizing 
the ratio 
w'R,; w 
wiRw ` 


This clearly is equivalent to the constrained minimization problem (18.71). 
We use the method of Lagrange multipliers to solve Eq. (18.71). To this end, we define 


E = wHRw +A (1 — wR w) (18.73) 


where à is a Lagrange multiplier. Letting the gradient of ° with respect to w equal to 
zero, one finds that the minimizer of ° should satisfy the following equation 


Rw = AR,,w. (18.74) 


This may be viewed as a generalized eigen-problem. The solution of the constrained 
minimizer of w4Rw is one of the M eigenvalue—eigenvector pairs (A, w) that satisfies 
Eq. (18.74). Multiplying Eq. (18.74) from left by wH, we obtain 


w'Rw = Aw UR, w=’ (18.75) 


Noting that R is a correlation matrix and, thus, is positive definite, all the eigenvalues that 
satisfy Eq. (18.74) are positive. Also, as the goal is to minimize w'Rw, we are interested 
in the eigenvector of Eq. (18.74) that is associated with its minimum eigenvalue. 

Next, rearranging Eq. (18.74) as 


1 
(R7'R,,)w = rh (18.76) 


we conclude that the optimum w that minimizes w"Rw, subject to the constraint 
wR, w = 1 (or, equivalently, w4s(6)) = 1), is given by 


wo =P {RR} (18.77) 


where P{-} refers to the principal eigenvector of a matrix, that is, the one that is associated 
with its largest eigenvalue. The solution, (18.77) when normalized to satisfy the constraint 
w'R,,w = 1, will be the same as Eq. (18.44). Next, we discuss practical applications of 
Eq. (18.77) in designing a few robust beamformers. 

Given the steering vector s(@)) and the measured correlation matrix R according to 
Eq. (18.20) or Eq. (18.21), one can replace the estimated matrix in Eq. (18.77) to obtain 
an estimate of w6 as 

wo =P {RR} (18.78) 


This method is often referred to as the sample matrix inversion (SMI). 
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The SMI method, if implemented according to Eq. (18.78), suffers from the same sen- 
sitivity problems as the original optimum beamformer. One common method of reducing 
this sensitivity, and thus to arrive at a robust design, is to replace R,, by 


R,, = TOR,, (18.79) 


where T is a properly chosen Toeplitz matrix and © denotes element-wise multiplication. 
Common choices of T are 
(TL. (18.80) 


and 


[Tan =e (18.81) 


m,n 
where g is a positive parameter. T is called tapered matrix because of obvious reasons. 

Application of tapered matrix to both R and R,, have also been introduced by some 
researchers. This method, which is referred to as covariance matrix taper (CMT), replaces 
R and R,, in Eq. (18.78) by 


R=TOR and R,,=TOR,, (18.82) 


respectively. A detailed review of these methods and their extensions can be found in 
(Shahbazpanahi et al., 2003). 


Problems 


P18.1 The MATLAB code used to generate the result of Figure 18.7 is available on 
the accompanying website. It is called ‘DOABartlett.m’. 


(1) Examine this code and explain how it relates to the mathematical equations 
in the text. 
Gi) By running ‘DOABartlett.m’, confirm the result presented in 
Figure 18.7. 
(iii) By making necessary changes to ‘DOABartlett.m’, examine the results 
for the following parameters and discuss your observations. 


(a) % = 10°, 0, = 30°, 6, = —30, P) = P, = P, = 1, and o? = 0.01. 
(b) 6) =10, 0, = 30, 6,=-30, P) = P, =1, P, = 0.01, and o? = 
0.01. 


P18.2 The MATLAB code used to generate the result of Figure 18.8 is available on 
the accompanying website. It is called ‘DOAMVDR.m’. 


(i) Examine this code and explain how it relates to the mathematical equations 
in the text. 
(ii) By running ‘DOAMVDR.m’, confirm the result presented in Figure 18.8. 
(iii) By making necessary changes to ‘DOAMVDR.m’, examine the results for 
the following parameters and discuss your observations. 


(a) % = 20°, 0, = 25, 0, = —30, P) = P, = P, = 1, and o2 = 1. 
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P18.3 


P18.4 


P18.5 


P18.6 
P18.7 


P18.8 


(b) 0) = 20, 0, = 25, 0 =—-30, Py =P) =1, P, = 0.01, and o? = 
0.01. 
(c) 6) = 20,0, = 25,0, = —30, Py = P, = 1, P, = 0.01, and o? = 0.1. 


The MATLAB code used to generate the result of Figure 18.9 is available on 
the accompanying website. It is called ‘DOAMUSIC.m’. 


(i) Examine this code and explain how it relates to the mathematical equations 
in the text. 
(ii) By running ‘DOAMUSIC.m’, confirm the result presented in Figure 18.9. 
(iii) By making necessary changes to ‘DOAMUSIC.m’, examine the results for 
the following parameters and discuss your observations. 


(a) 0% = 20°, 0, = 25, 0, = —30°, P) = P, = P, = 1, and o? = 0.1. 

(b) 4 = 20,6, =25,6,=—30, P) = P, = P, = 1, and o? = 1. 

(c) 0% = 20,0, =25 , 0, = —30 , P) = P, = 1, P; = 0.1, ando? = 0.01. 
(d) 6) = 20 , 0, = 25°, b =—30, P) = P, = 1, P; = 0.1, and of = 0.1. 
(e) 0 = 20°, 0, = 25, 6, =—30 , Py = P, = 1, P) = 0.5, and o? = 0.1. 


Show that the solution to the constraint minimization defined by Eqs. (18.30) 
and (18.31) is w = 748(6%). 


The MATLAB code that has been used to generate Figure 18.12 is available on 
the accompanying website. It is called ‘Nul1SteeringBF.m’. 


(i) Examine this code and explain how it relates to the mathematical equations 
in the text. 
(ii) By running ‘NullSteeringBF.m’, confirm the result presented in 
Figure 18.12. 
(iii) Modify ‘NullSteeringBF.m’ to steer the beam in the direction of 6,. 
Examine the modified code to confirm that the desired result is obtained. 
(iv) Repeat Part (iii) for the steered beam in the direction of 6. 


Using the method of Lagrange multiplier, prove Eq. (18.44). 


Consider a 10-element linear array with three narrowband signals impinging it 
with the angles of arrivals of 6) = 20 , 6, = 30°, and 0, = —10 . Considering 
the discussions around Eqs. (18.45) and (18.46), find the blocking matrix for the 
following cases. 


(i) With K = 1 and o? = 0.1. 
(ii) With K = 10 and o? = 0.1. 
(iii) With K = 100 and o? = 0.1. 
(iv) With K = 1000 and o? = 0.1. 
(v) With K = 1000 and o? = 0.001. 
(vi) With K = 1000 and o? = 10. 
(vii) Compare the above results and discuss your observations. In particular, 
study the array gain for the primary and the secondary beamformers 


Present a derivation of the LMS-Frost algorithm, given by Eqs. (18.58) and 
(18.59). 
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The LMS-Frost algorithm in the original formulation of Frost (1972) appears as 
win +1) =(1- — Tc (w(n)-2py*(n)x(n)) + cle 
M M 0- 


Show that this is the same as Eqs. (18.58) and (18.59). 


The MATLAB code presented in Example 18.5 is available on the accompanying 
website. It is called ‘Ex18_5.m’. Run this code to confirm the result presented 
in Figure 18.23. Also, modify the code to study the following cases and discuss 
the observed results. 


(i) The case presented in Example 18.6. 

(ii) The case where o? = 0 and there is no regularization (parameter y). 
(iii) The case where o? = | and there is no regularization. 
(iv) The case where o? = 10 and there is no regularization. 


Repeat Problem P18.10 when o; is reduced to 1. 


Show that when the solution (18.77) is normalized to satisfy the constraint 
w'R,.w = 1, it will lead to Eq. (18.44). 


Show that the choice of w that maximizes the SINR (18.72) also maximizes the 
ratio 
wER „w 


wHRw ` 


19 


Code Division Multiple 
Access Systems 


Code division multiple access (CDMA), one of the common technologies in cellular 
communications, has it roots in spread spectrum technique, a signaling method that was 
initially developed for secure communications in military applications. In the context 
of cellular communications, using CDMA, a number of users communicate with a base 
station over the same frequency band. However, to allow separation of the users’ infor- 
mation, each user is assigned a different signature, called spreading code. By careful 
selection of users’ spreading codes and/or effective signal processing, it is possible to 
extract information of each user while the interference from other users (called, multiple- 
access interferenceor or MAI) and noise are minimized. In this chapter, a number of 
adaptive signal processing techniques that may be used for reduction of MAI in CDMA 
systems are introduced and some of their aspects are analyzed. 


19.1 CDMA Signal Model 


Figure 19.1 presents the process of generating a CDMA signal. Each information symbol, 
say, So (7), from a user (here, we have chosen the Oth user to reference to) is first multiplied 
by a spreading code sequence, {cg ;, i = 0,1,..., L — 1}. Each element of the spreading 
code sequence, cg ;, is referred to as a chip. Also, because of reasons that will become 
clear later, the length of the spreading code sequence, L, is referred to as spreading gain. 
The information symbols sọ(n) are coming at a spacing T, hence, have a rate of 1/T. 
Assuming that sọ(n) and co; are binary, after spreading, the generated sequence is also 
a binary, however, has a rate L times faster than sọ(n), that is, L/T. The block pr(t) 
is the transmit filter. Following the principles developed in Chapter 17, here, pņ(t) is a 
square-root Nyquist filter set to match the rate of its input sequence, L/T. At the channel 
output, noise, and signals from other users (similarly generated CDMA signals) are added. 
At the receiver input, the received signal is passed through a filter pp(t) that is matched 
to the transmit filter, that is, pp(f) = pr(—t). The output of this matched filter is then 
sampled at a rate 1/T, > L/T. The result is the signal sequence x(n), which has to be 
processed to extract the transmitted information symbols. The emphasis of this chapter 
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Figure 19.1 CDMA signal model. 


is on various processing techniques that are applied to the signal sequence x(n), for this 
purpose. 


19.1.1 Chip-Spaced Users-Synchronous Model 


Let us consider the simple case where the channel is absent and signals from all users are 
perfectly synchronized. Also, assume that the output of the receive filter pp (t) is perfectly 
sampled at the middle of each chip. Under this perfect condition, the vector of the signal 
samples over the interval of the nth information symbol sọ(n) is obtained as 


K-1 
x(n) = So(n)ey + X` V/Pysp(n)ey + vin) (19.1) 
k=1 


where P, is signal power of kth user, it is assumed that P) = 1, K is the number of 
users, 
Ck = [Ck 0 Ce e Cka] (19.2) 


x(n) = [xo(n) x(n) «++ xp)" (19.3) 


and v(n) is the noise vector. We assume that the elements of v(7) are a set of identically 
independent zero-mean Gaussian random variables with variance a, Hence, v(7) has the 
covariance matrix 

Elv(n)v'(n)] = 0°71 (19.4) 


where I is the identity matrix. 
Under the condition that the spreading codes €o, ¢,,..., and cx _; are a set of orthogonal 
vectors, that is, cle, = 0, for k Æ l, an estimate of sọ(n) can be trivially obtained as 


= 5q(n) + v'(n) (19.5) 
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T. 
where |[¢g||? = cfco and v'(n) = Tae . Note that under this condition, the MAT (i.e., 


interference from other users) is completely removed. 
To develop more insight to the process of data recovery equation (19.5), we note that 
in Eq. (19.1), the signal-to-noise ratio (SNR) can be calculated as 


Elllso@eoll7] 


SNR; = —-——— 
" EvA 
22 2 
coll E 
_ lleoll ba _ llcoll (19.6) 
Lo? Lo? 
On the other hand, considering Eq. (19.5), at the estimator output, 
E[s2(n 
SNRyy = EEEO 
E[v" (n)] 
1 lleoll? 
= = (19.7) 
op /lleoll? o? 
Next, we define the processing gain of a CDMA system as 
SNR 
G = _ (19.8) 
SNR;n 
Substituting Eqs. (19.6) and (19.7) in Eq. (19.8), we obtain 
_ _lleoll?/o? 
llcoll?/ Lo? 
=k: (19.9) 


That is, under the ideal synchronized condition assumed here, the processing gain of 
CDMA is equal to the number of chips per information symbol. 

The presence of a channel adds the channel impulse response ho ; to the path between 
Sọ(n) and x(n). As a result, if a signal vector x(n) of length L of x(n) is extracted, the 
counterpart equation to Eq. (19.1) may be written as 


K-1 
x(n) = So(n)ey + >: JP, s, (ne, + v(n) (19.10) 

k=l 
where Ch for k =0,1,..., K — 1, are obtained by first convolving cp; and hg ;, and 


then truncating the result to the length L. One may also note that the presence of the 
channel results in some interference among adjacent symbols. In this chapter, we assume 
that the length of each channel is much shorter than the length of the spreading code, 
hence, ignore such interference in Eq. (19.10) and similar equations that will be developed 
later. Moreover, one should note that although the vectors c, are real-valued and pr(t) 
and pp(t) are real functions of time, the vectors c}, are not, because the channel impulse 
responses h; ; are, in general, complex-valued. In the discussion that follows, for brevity 
of notations, we remove the prime signs from the vectors c; in Eq. (19.10); however, it 
is understood that unlike the vectors c, in Eq. (19.1), these are complex-valued and most 
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likely nonorthogonal. Also, for the future developments, it is convenient to present the 
following vector-matrix formulation of Eq. (19.10): 


x(n) = Cs(n) + v(n) (19.11) 


where 
s(n) = [so(n) sn) «++ Se_y(n)]" (19.12) 


is a column vector of length K, and 


C = Co y Pic, eisa «y Pg—1Ckı (19.13) 
| | | 


is an L x K matrix. 

As in presence of channel the columns of C may be nonorthogonal, a trivial detector 
like Eq. (19.5) will not be able to remove MAI. Hence, the main problem that is addressed 
in this chapter is the following. What is a good linear estimator in the form of 


So(n) = w'x(n) (19.14) 


where w is a tap-weight vector, which results in a reasonable estimate of sọ(n) that 
balances between the residual MAI and noise. Note that in Eq. (19.14), w and x(n) are 
complex-valued vectors. Moreover, following the notations that were first introduced in 
Chapter 3, we define 

x(n) = Boln) x(n) +++ xr- 0] (19.15) 


and 
w= [wọ w; -= wz_]" (19.16) 


where the superscript H denotes complex-conjugate transpose, or Hermitian. 


19.1.2 Chip-Spaced Users-Asynchronous Model 


Consider the case where the users are not symbol synchronized. We assume that the 
received signal samples are symbol synchronized with the desired user (i.e., user 0), and 
the users | through K — 1 have integer time offsets L, through L;, respectively, with 
respect to the user 0. We note that although L,’s can be either positive or negative, here, 
we limit L,’s to positive values only, as this greatly simplifies the equations, without 
causing any loss in the generality of our conclusions. 

When 0<L, < L-1, for k=1,2,...,K —1, Eq. (19.10), with the vectors c 
replaced by ¢,, converts to 


K-1 
x(n) = so(n)eg + D> Pi (sp (n)e} + 54 (2 — Ie) + v(x) (19.17) 
k=1 


where c; is a length L vector with its first L — L, elements equal to those of c, and the 
rest of its elements equal to 0. On the other hand, cd is a length L vector with its first 
L — L, elements equal to 0 and the last L, elements equal to those of c}. 
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The vector-matrix formulation (19.11) is also applicable to Eq. (19.17), with the fol- 
lowing modifications to the definitions of the vector s(n) and the matrix C: 


s(n) = [So(2) sı (n) s(n — 1) +++ sg) sgan — bw (19.18) 


and 


| | | | | 
C= | co yPici J Picy © yPr-iCk-i PKK 1 (19.19) 
| =a | | | 


19.1.3 Fractionally Spaced Model 


The CDMA signal model, when the received signal is sampled at a rate faster than 
the chip rate (hence, fractionally spaced samples are obtained), leads to an equation 
similar to Eq. (19.11). The difference only lies in the fact that the samples of x(n) 
and s(n) are fractionally spaced. The columns of C are also samples of the spreading 
codes that are fractionally spaced. Hence, the detectors that are introduced below are 
applicable to both symbol-spaced and fractionally spaced as well as users-synchronous 
and users-asynchronous cases. However, there is an advantage in using the fractionally 
spaced samples. The receiver performance remains insensitive to small variations of the 
timing phase, say, within a chip period. Hence, fractionally spaced CDMA receivers may 
perform more robustly when compared with their chip-spaced counterparts. This concept 
follows the same principles to those that were discussed in Chapter 17 with regard to the 
symbol-spaced versus fractionally spaced equalizer. 


19.2 Linear Detectors 


A linear detector finds an estimate of the desired user symbol sọ(n) according to the 
equation 
Son) = w'x(n) (19.20) 


where w is a tap-weight vector that is chosen based on a selected criterion. In this 
section, we review three criteria for the selection of the tap-weight vector w and develop 
the relevant detectors. 

The detectors that are presented later require one or more of the following information 
to obtain the desired estimate of the tap-weight vector w. 


. The spreading code of the desired user, the vector €o. 

. The spreading code of the interfering users, the vectors c, through cx_. 

. The symbol boundaries of the desired user. 

. The symbol boundaries of the interfering users. 

. The received amplitudes of the interfering users, ,/P,, for k = 1,2,...,K —1. We 
assume Pp = 1. 

6. The channel noise statistics. 

7. The training sequences that may be used for initial adaptation of the weight vector w. 


nABWN Re 


In the sequel, we refer to the above items as requirements (1) through (7). 
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19.2.1 Conventional Detector: The Matched Filter Detector 


The most trivial linear detector sets 
w = Cy/llegll” (19.21) 


This may be thought as a filtering operation matched to the desired user signal. Its 
implementation only needs the requirements (1) and (3) of the above list. 

We note that because the spreading codes €o, €}, ..., and ¢x_, in Eq. (19.11) are 
the original spreading codes after passing through the respective channels, they are most 
likely nonorthogonal, even if the original spreading codes were orthogonal. Hence, the 
matched filter output is obtained as 


K-1 H 
A coc 
§o(n) = so(n) Y ic Ta + Vout (7) (19.22) 
k=1 "0 
where 
cHy(n) 
Vout (7) = Teol? (19.23) 


is the noise sample at the detector output. Also, note that the second term on the right- 
hand side of Eq. (19.22) is the interference due to the other users. Thus, it is referred to 
as MAI (multiple-access interference). 

We note that the knowledge of the exact value of co requires one to know the spreading 
code of the desired user at both the transmitter as well as the respective channel impulse 
response. However, the latter is usually unavailable. Hence, in practice, ¢g in Eq. (19.21) 
is often replaced by the desired user spreading code at the respective transmitter. This 
clearly incurs some loss in performance, as the receiver is not exactly matched to the 
received waveform. This problem may be solved by using the so-called rake receiver. 
A rake receiver consists of a set of filters matched to the spreading code of the desired 
user, but with different delays to extract the signal energy from the different paths of 
the channel impulse response. The outputs of these filters are combined together after 
applying a set of weight factors to obtain an output that optimally combines the received 
signals from different paths. In any case, the conventional/matched filter detector suffers 
from some level of MAI, which makes it inferior to the rest of the detectors that are 
presented in the remaining parts of this section. 


19.2.2 Decorrelator Detector 


The decorrelator detector ignores the noise term in Eq. (19.11) and solves the equation 
Cs(n) = x(n) (19.24) 


for the vector s(n) and extracts the desired user symbol from the calculated solution. This 
solution is obtained by first multiplying Eq. (19.24) from left by C! and the result, from 
left, by (CHC). These steps lead to the solution 


Sín) = (CHC) 'C#x(n) (19.25) 
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We note that in the absence of channel noise, v(n), §(n) = s(n). Hence, the decorrelator 
detector separates the users’ symbols perfectly and, thus, its output will be free of MAI. 
If v(m) is nonzero and is included in the above equations, one will find that 


§(n) = s(n) + Vou (n) (19.26) 


where 
Youn) = (CHC) Cwn) (19.27) 


For the user of interest, we have 
So (n) = soln) + Voou (n) (19.28) 


One may also note from the above results that the tap-weight vector w of the decorre- 


H 
lator detector, for the desired user, is equal to the first column of (Prg e] . This, 
in turn, implies that 


wx (the first column of (œc) ') (19.29) 


Although the decorrelator detector removes MAI completely, it suffers from possible 
noise enhancement problem. To explain this, using Eq. (19.27) and recalling Eq. (19.4), 
one will find that 


E [vox (n) n)] = 02 (CHC)! (19.30) 


The diagonal element of this matrix is equal to the noise variance at the decorrelator 
detector outputs, with the first one of them being that of the desired user. The size of 
these variances depends on the condition number (the spread of eigenvalues) of the matrix 
CHC. If this matrix has a large condition number, the diagonal elements of (cH)! 
will be a set of large numbers, and the decorrelator detector will suffer from a noise 
enhancement problem. 

Another problem that the decorrelator detector may face in practice is that to find w, 
one needs an accurate estimate of C, and this may not be readily available. Moreover, 
the estimation of C over a channel may not be a straightforward task. Hence, an exact 
implementation of a decorrelator detector can be a difficult task to achieve in practice. 
See Section 19.3 for more exposure. 

Both problems of noise enhancement and exact implementation of a decorrelator detec- 
tor make this method unattractive in practice. The MMSE detector, that is introduced 
next, and the blind detector, that is discussed subsequently, are significantly less complex 
to implement and yet perform better. Noting these, one may find that the decorrelator 
detectors are only attractive from a historical point of view. 


19.2.3 Minimum Mean-Squared Error (Optimal) Detector 


The minimum mean-squared error (MMSE) detector, as its name implies, is constructed 
based on the cost function 
é = E[|sy(n) — 89(n)|7] (19.31) 
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The tap-weight vector w is set/adapted to minimize £. Recalling Eq. (19.14), Eq. (19.31) 
can be written as 
§ = E[|so(n) — w'x(n)|"] (19.32) 


This, clearly, is a typical Wiener filtering problem, which was thoroughly studied in 
Chapter 3. It, thus, has the solution 


w, =R'p (19.33) 


where R = E[x(n)x"(n)] and p = E[x(n)sj(n)], because, here, the desired output of the 
Wiener filter is sọ(n). Moreover, the MMSE of the detector is obtained as 


Enin = 1—p'¥R'p (19.34) 


Using Eq. (19.11) and assuming that the information symbols sọ(n) through sg_ı(n) 
are independent of one another and they are also uncorrelated with the noise vector v(n), 
one will find that 

P = Co (19.35) 


and 
R = CC! + 071 (19.36) 


where C is given according to Eq. (19.13) or Eq. (19.19) depending on the system being 
users synchronous or users asynchronous. 
Substituting Eqs. (19.36) and (19.35) into Eqs. (19.33) and (19.34), we obtain, respec- 
tively, 
W, = (CCH + 021) "co (19.37) 
and 


Emin = 1 — cH (CCH + 021) ‘ey (19.38) 


It is also instructive to note that the MMSE detector output is given by 
Son) = wix(n) 
K-1 


= (ci! (CCH + oil) 'co) so(n) + (acc + oI) ex) s(n) 
k=1 


+c! (CCH + 621) ‘v(n) (19.39) 


Noting that the first term on the right-hand side of Eq. (19.39) is the desired signal, and 
the rest of the terms are interference and noise, the signal-to-interference-plus-noise ratio 
(SINR) at the detector output is obtained as 


2 
lcti(CCH + a2) "eq| 


SINR = (19.40) 


_ 2 = 
Fi lcil(Cc# +021) ‘e,| + oe (CCH + 021) eq 


One may infer from Eq. (19.37) that the computation of w,, hence, the implementation 
of the MMSE detector needs requirements (1) through (6) of the above list. In practice, 
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where w is adjusted through an adaptive algorithm, one only needs to be aware of the 
symbol boundaries of the desired user, that is, the requirement (3) only. In addition, at 
the beginning of each communication session a training sequence that is known to the 
receiver should be transmitted by the transmitter of the desired user; the requirement (7). 
The receiver uses these training symbols to find an estimate of w,, through an adaptive 
algorithm. Once a good estimate of w, is obtained, the system is switched to a decision- 
directed mode for further tuning of the detector and/or tracking of the variations in the 
channel condition. Either of the LMS algorithm or RLS algorithm or any of their variations 
may be used. More details are given in Section 19.3. 

Another point that is worth noting here is that under the condition where the channel 
noise is absent, hence, oe = 0, the MMSE detector reduces to the decorrelator detector. 
This can be explained simply by noting that when the channel noise is absent, the cost 
function € of Eq. (19.32) can be reduced to 0 by choosing w such that MAI is completely 
removed. This, of course, is nothing but the decorrelator detector. A mathematical proof 
of this concept is left as a problem at the end of this chapter. 

The final note here is that the MMSE detector is also referred to as optimal detector. 
Obviously, it is optimal in the sense that it minimizes the sum of the residual MAI and 
noise power at its output. Moreover, as discussed below, the MMSE detector may be seen 
as an optimal detector in the sense that it maximizes the SINR at its output. 


19.2.4 Minimum Output Energy (Blind) Detector 
Consider a detector that is set to minimize the cost function 
E = El|w'x(n)/7] (19.41) 


subject to the constraint 


wiey = 1 (19.42) 


Such a detector is called mean output energy (MOE) detector, because of the use of the 
cost function (19.41). 
Rearranging Eq. (19.11) as 


X(N) = coso(n) + C's'(n) + v(n), (19.43) 


where C’ is obtained from C after removing its first column, and s'(n) is obtained from 
s(n) after removing its first element, and substituting Eq. (19.43) in Eq. (19.41), after 
applying the constraint we, = 1, we obtain 


& = E[|so(n) + w'x’(n)|7] (19.44) 


where x'(n) = C’s'(n) + v(n). Assuming that s (7) is uncorrelated with the other users’ 
data and channel noise, Eq. (19.44) can be rearranged as 


E = E[|so(n)|7] + El|w"x'(n)|7] (19.45) 


As the first term on the right-hand side of Eq. (19.45) is a constant (as mentioned before, 
here, we assume that E Iso) = 1), independent of w, the vector w that minimizes 
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E[|w"x’'(n)|?] is also the minimizer of the cost function £ (of course, subject to the 
constraint we, = 1). On the other hand, noting that E[|w"x’(n)|?] arises from MAI and 
noise, one may argue that minimum output energy (MOE) detector maximizes the SINR 
at the output y(n) = wix(n). 

Next, we note that Eq. (19.45) can be rearranged as 


é=-1+w'Rw (19.46) 


where R’ = E[x’(n)x'4#(n) = cc + o7I. Using the method of Lagrange multipliers, to 
minimize w4R’w subject to the constraint wc) = 1, one will find that, the optimum 
tap-weight vector of MOE is given by 


Reo 
w. = — (19.47) 
eRe Gs 
Also, 
Hp’ — 
Wo R Wo = ER Te (19.48) 
Moreover, the MOE SINR is obtained as 
SINR = cËR' cg (19.49) 


Another instructive observation that is discussed in Problem P19.3, at the end of this 
chapter, is that the right-hand sides of Eqs. (19.40) and (19.49) are equal. This in turn 
implies that the MMSE and MOE detectors are equivalent in the sense that after the 
optimum setting of their tap weights both result in the same SINR value. However, the 
adaptation algorithms that they use, as discussed in the following section, are significantly 
different. 

The MOE detector is often referred to as blind detector because of the following 
reason. In order to estimate the vector w that minimizes the cost function (19.41) subject 
to the constraint (19.42), one does not require any training sequence. It only requires the 
desired user spreading vector €o and the symbol boundaries of the desired user, that is, 
requirements (1) and (3) of the list that was presented at the beginning of this section. 

To develop further insight to the properties of MOE detector, we proceed with a study 
of its properties under a few specific cases. 


Noise Free Case 
Assuming that the channel noise v(7) is absent, Eq. (19.11) may be expanded as 


K-1 


x(n) = coso (n) + 5 cs (n) (19.50) 
k=1 


where the first term on the right-hand side is the desired signal and the second term is MAI. 
Assuming that K < N, the MAI belongs to a subspace of dimension K — 1 (or smaller), 
expanded by the spreading code vectors c, through cx _;. We refer to this as MAI subspace. 
The MOE detector can completely remove MAI, by choosing a tap-weight vector w that 
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is orthogonal to the MAT subspace. Moreover, one may write €o as €o = Cy 1 + Cp, Where 
Co,2 is the projection of cọ into the MAI subspace. 
We note that as €o 2 belongs to the MAI subspace and w is orthogonal to this subspace, 


the constraint (19.42) reduces to 
H 


WwW C01 = 1 (19.51) 
Accordingly, one will find that, in this case, the tap-weight vector 
€o,1 
= —— (19.52) 
° lleol? 


satisfies the constraint (19.42) and removes MAI completely. In addition, one may realize 
that when K < N, this solution is not unique. Any addition to w, of Eq. (19.52) that is 
orthogonal to the spreading vectors €o through cx_, may also be seen as a solution that 
removes MAI completely and satisfies the constraint (19.42). The solution (19.52) would 
be unique in the sense that among all the solutions, it has the minimum norm. Hence, we 
refer to it as the minimum norm solution. 

Next, we present an alternative method of calculating the minimum norm tap-weight 
vector w that removes MAI and satisfies the constraint (19.42). From the earlier discussion, 
we infer that the desired tap-weight vector belongs to the subspace spanned by the columns 
of the spreading code matrix C. Hence, it may be written in the canonical form 


w = Cw (19.53) 
where w’ is a length K vector. Considering Eq. (19.53), Eq. (19.42) may be written as 
co = 1 (19.54) 


where ¢) = C¥cy. With these, one may use the following procedure to design an MOE 
detector. 


Minimize the cost function 
E = E[|w"x(n)/7] = Ellw"C#x(n)/?] 


subject to the constraint (19.54). 


Recalling Eq. (19.24), here, we will find that 
E = w''Sw' (19.55) 


where S = (C#C)?. Note that w’ has the length of K and S is a K x K matrix. Using the 
method of Lagrange multipliers, one will find that minimization of Eq. (19.55), subject 
to the constraint (19.54 ), has the following solution 


w, = = (19.56) 


Also, 
1 


a 19.57 
esc. ( 


Enin = 
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As MAI is completely removed, what remains at the MOE detector is sgn). Hence, 
Emin = E[lso(n)|7] = 1. Using this result and Eq. (19.57), one will find that c'ọ S7te) = 1. 
This, in turn, implies 

w, = Sle (19.58) 
Finally, using this result in Eq. (19.53), we obtain 

w, = Cw, (19.59) 


Spreading Code Mismatch 


In practice, because of the presence of the channel, the spreading codes of the desired 
and other users undergo some distortion before reaching the receiver. Hence, the MOE 
tap-weight vector w can only be obtained based on an estimate of the spreading code of 
the desired user, ¢g (after considering the channel effect). Here, we do not consider the 
receiver uncertainty of other users’ spreading codes, as the adaptation algorithms that are 
introduced later, need not know such estimates. 

Let the estimate of cọ be ĉo. The MOE detector attempts to minimize the cost function 
é of Eq. (19.41) subject to the constraint 


wig, =1 (19.60) 


Here, also, when the channel noise is absent, the minimum norm solution w to this 
problem belongs to the subspace spanned by the columns of the spreading code matrix 
C. Hence, Eq. (19.53) through Eq. (19.56), with co replaced ¢p, are applicable and, thus, 
=A 
eg es (19.61) 
cr S-'¢, 

where ĉ = C4é). Note that here the identity &,;, 
cation (19.58) is not applicable. 

In Honig et al. (1995), the authors have discussed the impact of the spreading code 
mismatch on the performance of a CDMA system. The general conclusion there is that 
under code mismatch, the MOE detector may also remove part of the energy of the desired 
user, hence, some degradation in the system performance should be expected. To remedy 
this problem, it is proposed in Honig et al. (1995) that one may start the receiver by first 
running an MOE detector adaptive algorithm for a coarse adjustment of the tap-weight 
vector w. Once w is adjusted to some degree and the output from the MOE detector can 
be considered as reliable, one can switch to a decision-directed mode adaptive MMSE 
detector. The adaptive MMSE detector converges to near the optimum solution (19.37) 
without any need to know co. 


= | may not hold, hence, the simplifi- 


Including the Channel Noise 


In the presence of channel noise, w is no longer limited to the subspace spanned by the 
columns of C. In that case, the MOE detector has the same solution as the optimum 
detector. Hence, the MOE leads to the solution (19.37) and (19.38). Note that inserting 
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a? = 0 in Eqs. (19.37) and (19.38) does not lead to any fruitful result, as the rank N 
matrix CCH + 71 converts to a rank K < N matrix CCĦ. It was in the light of this 
mathematical problem that the minimum norm solution discussed earlier was derived. 


19.2.5 Soft Detectors 


In Chapter 17, Section 17.9, it was noted that the optimum receivers could be implemented 
using the methods of soft equalization. We also introduced the methods of soft MMSE 
equalizer and statistical soft equalizer. These methods are extendable to CDMA systems 
as well and somewhat are similar to their equalizer counterparts. Interested readers are 
encouraged to refer to Wang and Poor (1999) and Farhang-Boroujeny, Zhu, and Shi (2006) 
for details. 


19.3 Adaptation Methods 


Recall that in the previous section we listed seven requirements and noted that each of 
the detectors discussed there needs a few of these requirements. In this section, we review 
further features of the discussed detector with emphasis on the adaptive methods related 
to their implementation. 


19.3.1 Conventional Detector 


The conventional/matched filter detector needs the requirements (1) and (3), only. How- 
ever, compared to other detectors, it performs poorly. Strictly speaking, the vector €o 
in requirement (1) refers to the spreading code of the desired user at the receiver side 
(obtained by convolving the corresponding spreading code at the transmitter and the chan- 
nel impulse response). However, in practice, as channel is unknown (and is time varying), 
Co, at the receiver, is replaced by its counterpart at the transmitter, and this incurs further 
loss in performance. 


19.3.2 Decorrelator Detector 


The decorrelator detector, on the other hand, needs to have access to the requirements 
(1)—(4). Moreover, the spreading code vectors Co through cx,_, must be those at the 
receiver side. Hence, for the purpose of implementation, €o through cx, _, should be esti- 
mated, through use of some training/pilot symbols and by adopting an adaptive algorithm. 
Either of LMS or RLS algorithms and their alternative forms that we presented in the 
previous chapters may be used for this purpose. Moreover, as wireless channels are time 
varying, training symbols should be inserted at regular intervals between the data symbols. 
Overall, the process of estimating the spreading codes €o through cx_, is a rather expen- 
sive process, both in terms of computational complexity and the use of channel resources. 
The latter refers to the fact that the time used for transmission of training symbols could 
otherwise be used for transmission of data. In light of these difficulties and the fact that 
both MMSE and MOE detector are simpler to implement and are superior in performance, 
the decorrelator detector is of interest only because of historical and theoretical reasons. 
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19.3.3 MMSE Detector 


The adaptation of MMSE detector requires transmission of a training sequence by the 
desired user (requirement 7) and the knowledge of the symbol boundaries of the desired 
user (requirement 3). Once this information is available, any adaptation algorithm may 
be used to adjust the tap-weight vector w. 

In an adaptive setting, the convergence of the MMSE detector is determined and/or 
directly linked to the eigenvalues distribution of the correlation matrix R = CC# + o7I. 
In particular, when the number of users, K, is smaller than the processing gain, N, (which 
is usually the case), and also the noise power, oĉ, is significantly smaller than the users’ 
signal powers, R will have a large eigenvalue spread. In this case, any LMS algorithm 
will suffer from slow convergence, and any RLS algorithm may experience difficulties 
because of potential round-off noise problems. The problem in LMS algorithm may be 
solved by adopting a leaky LMS algorithm. For the RLS type algorithms, a number of 
solutions that are beyond the scope of this book have been proposed; see Sayed (2003) 
for a broad discussion on these types of algorithms. 

Another point that deserves some attention is the choice of the initial value of w, say 
w(0), before starting an adaptive LMS algorithm. We note that the N-by-N matrix CC# 
is arank K matrix, thus, its N — K smaller eigenvalues are 0. Moreover, the eigenvectors 
associated with the nonzero eigenvalues of this matrix belong to the subspace expanded 
by the columns of C. In addition, assuming that the nonzero eigenvalues of CC are àp, 
àis... and Ax_,, the eigenvalues of the correlation matrix R = E [x(n)x#(n)] = CCH + 
of] are Ay +07, Ay +02, ..., Àg- +07, 07, ..., and of. Clearly, these eigenvalues 
determine the convergence rates of the various modes of convergence of the MMSE 
detector, and the last N — K eigenvalues are those associated with the subspace orthogonal 
to the signal subspace, that is, the subspace expanded by the columns of the spreading 
code matrix C. Accordingly, to avoid the slow modes of convergence in an LMS algorithm 
that may be used for adaptation of w, w(0) should be a vector within the signal subspace. 
Clearly, the most trivial solution that satisfies this condition is w(0) = 0. Also, as discussed 
in Lim et al. (2000), it may be also helpful to reset w to 0, every time a user leaves/joins 
the network, to avoid slow modes of convergence of the MMSE detector. 


19.3.4 MOE Detector 


The adaptation of MOE detector requires the spreading code of the desired user (require- 
ment 1) and the knowledge of the symbol boundaries of the desired user (requirement 3). 
Once this information is available, any constrained adaptation algorithm may be used to 
adjust the tap-weight vector w. The most common algorithm used in practice is the con- 
strained LMS algorithm (or its normalized version) that was introduced in Section 6.11. 
Also, as noted earlier, to take care of the receiver inaccuracy in its knowledge of the 
spreading code of the desired user (because the presence of channel changes it), it is 
common to run the constrained LMS algorithm for an initial training of the MOE detec- 
tor. Once the tap-weight vector w reaches a point where the transmitted symbols sọ(n) are 
detected correctly, with a good probability, the receiver is switched to an MMSE detector 
and the symbol decisions made by the detector are used to fine tune w and track possible 
variations of the channel. 


Code Division Multiple Access Systems 709 


19.3.5 Soft Detectors 


The implementation of a soft detector needs the requirements (1) through (6). The require- 
ments (1), (2), (5), and (6) may be replaced by the requirement (7), where the latter is used 
to obtain the estimates of the formers. It is also possible to ignore requirement (7) and 
adapt some blind channel identification method to find the spreading codes of the users, 
requirements (1) and (2). One such method is presented in Liu and Xu (1996), where 
the authors have developed a MUSIC-like algorithm for blind detection of the spreading 
codes €o through ¢x_. 

It is also worth noting that a soft detector when combined with a channel decoder in 
a turbo receiver, in the same manner as the concept of the soft equalization that was 
introduced in Chapter 17 (Figure 17.35), leads to a receiver with optimum performance. 
Such receivers can approach the channel capacity bound and, thus, it has become exceed- 
ingly popular in recent years (after introduction of the third and fourth generations of the 
wireless systems). We will get to more details of such receivers in the next chapter, in 
the context of MIMO systems (i.e., transceivers that are equipped with multiple antennas 
at both transmitter and receiver sides of a communication channel). 


Problems 


P19.1 Recall that the tap-weight vectors of the decorrelator and MMSE detectors are, 
respectively, given by Eqs. (19.29) and (19.37). Show that under the condition 
where oe — 0, Eq. (19.37) reduces to Eq. (19.29). 
Hint: Consider using the matrix inversion lemma formula 


(Bo! + cD'C#)"' = B — BC(D+ C™BC) 'C#B 


and also the identity 
d+A)-'=I1-(d+A)'!A 


P19.2 Derive the results presented in Eqs. (19.47) and (19.49). 


P19.3 This problem attempts to provide further insight to the similarities and differ- 
ences of the MMSE and MOE detectors, through numerical examples. Develop 
a MATLAB code that generates a random spreading code matrix C, according to 
“C=randn(N,K)” and performs the following studies. 


(i) Assuming N = 10 and K =3 and o? = 0.01, evaluate the SINR of the 
MMSE detector according to Eq. (19.40) and the SINR of the MOE detector 
according to Eq. (19.49). By running your code, confirm that both SINR 
values are the same all the times. 

(ii) For both detectors, find the optimum tap-weight vector w, according to 
Eqs. (19.37) and (19.47), respectively. The answers should not be exactly 
the same. However, they are closely related. Explain your observation. In 
particular, if we denote the ith element of w, of the MMSE detector by 
wa; and its counterpart from the MOE detector by w¢}*, the following 
holds for all values of i: 


mmse 
0,1 1 


w ~ 141/SINR 


0,1 


w 
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P19.4 


P19.5 
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(iii) The above numerical results can be also proven analytically, though the 
derivations are hefty. An interested reader may attempt such proofs. 


Given the spreading code matrix C, present a numerical procedure to decompose 
Co tO Cy = Cy; + Cy. and find the minimum norm solution (19.52). 
Hint: Consider using the projection operator concept introduced in Section 12.3. 


Starting with a random generation of the spreading code matrix C according to 
“C=randn(N,K)” and recalling the solution to Problem P19.4, develop a MATLAB 
code to find w, numerically according to Eqs. (19.52) and (19.59), and confirm 
that both lead to the same result. 


20 


OFDM and MIMO 
Communications 


Orthogonal frequency division multiplexing (OFDM) and communication systems that 
use multiple antennas at transmitter and receiver sides of the link (known as MIMO, for 
multiple-input multiple-output) heavily benefit from the various signal processing tech- 
niques, in general, and adaptive filters, in particular. In this chapter, we present a review 
of these methods and discuss the role of adaptive filtering techniques in implementing 
these systems and improving their performance. 


20.1 OFDM Communication Systems 


The OFDM belongs to the broader class of multicarrier communication systems. The 
general concept is to divide a fast stream of data into a number of much slower substreams 
to reduce/avoid intersymbol interference (ISI); see Farhang-Boroujeny (2011) for a tutorial 
introduction to the broad class of multicarrier systems. In OFDM, the ISI is completely 
removed by inserting a guard interval equal to or longer than the duration of the channel 
impulse response before each block of data symbols. It is common to refer to each block 
of data symbols as an OFDM symbol. Hence, each OFDM symbol consists of a set of 
data symbols, each of which may be from a QAM constellation. 


20.1.1 The Principle of OFDM 


In OFDM, the transmitter begins with generating the complex baseband signal vector x(n) 
by taking the inverse DFT (IDFT) of the symbol vector s(n) = [so(7)s,(1) +--+ aol 
viZ., 


x(n) = F`!s(n) (20.1) 


Adaptive Filters: Theory and Applications, Second Edition. Behrouz Farhang-Boroujeny. 
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where F is the DFT matrix and thus F~! is the IDFT matrix. Noting that 


1 1 1 vee 1 
1 ei wx! ei Wx! jet x 
z'a - it eiF x2 el x2 i x2 (20.2) 
1 ef BXN-D Qi FxW-D .., ej R xN) 
one finds that Eq. (20.1) can be expanded as 
N-1 
x(n) = > x(n) (20.3) 
k=0 
where i 
x(n) = eee, fork =0,1,2,...,N—1 (20.4) 
and 
1 
oi Wx! 
; ack 
f=] 2 (20.5) 
oi fk x-1) 


This observation shows that the kth symbol s(n) modulates a complex carrier at frequency 
2k 


Sk = =: The scale factor x comes from the definition of the IDFT and may be removed 
without any major impact on the results. Also, we refer to the f,’s as modulator vectors. 

For the purpose of transmission, the complex-valued sequence x(n), obtained by 
concatenating the OFDM symbol vectors x(n), is modulated to an RF band. At the 
receiver, through a demodulation process, x(n) or a distorted version of it is recovered; 
distortion, here, is caused by the channel and, thus, is linear. Assuming an ideal 
channel, an undistorted replica of x(n) is recovered and, thus, the application of 
a DFT to each frame (i.e., each symbol vector) of x(n) recovers the data symbols 
so(n), sı(n),..., Sy_—y(1). However, the sequence x(n) often has a high rate, and the 
presence of a multipath channel, in general, results in a significant level of interference 
among the samples of x(n), similar to ISI in single carrier communications (Chapter 
17). Thus, some sort of equalization is needed to combat the channel distortion. In the 
case of single carrier communications, as discussed in Chapter 17, it is common to use 
an adaptive transversal equalizer with several taps. This has the disadvantages of high 
computational complexity and slow convergence. One of the key features that has made 
OFDM popular in broadband communication systems is that the very special structure 
of the OFDM symbols allows a very low complexity equalization method. Also, for 
reasons that are explained below, the OFDM equalizer does not suffer from any slow 
convergence problem. 

The approach that OFDM designers have taken to combat channel effect is based on the 
following observation. Each subcarrier in OFDM is a constant amplitude sinusoidal tone 


OFDM and MIMO Communications 713 


e(n) ++ -[@P)_x@—1) (eR) xm) O SF) 
(a) 
= = ist 


n: E ya-) B y(n) E 


Figure 20.1 An OFDM signal with the added cyclic prefix samples: (a) transmitted signal x(n) 
and (b) received signal y(n). The presence of cyclic prefix samples avoid ISI among adjacent 
OFDM symbols. 


that is modulated by the respective data symbol s(n). Moreover, the frequencies of these 
tones are selected such that their summation at the transmitter can be generated through 
an IDFT and at the receiver they can be separated through a DFT. To perform the latter 
step, the DFT is applied to a length N sequence of the samples of the received signal. On 
the other hand, we note that each tone in the transmit signal undergoes a transient, which 
is equal to the duration of the channel impulse response, before reaching its steady state 
(i.e., becoming a modulated tone with the respective frequency) at the receiver. Hence, 
to allow separation of the data symbols at the receiver through a DFT, the tones at the 
transmitter should be extended from the length N to a length N + Nap, where Nop is a 
duration equal to or longer than the duration of the channel impulse response. Given 
the form of the tones (i.e., Eq. 20.5), this extension can be made by taking the last Nop 
samples of x(n) and appending to its beginning. The added samples are thus called cyclic 
prefix or CP, in short. The subscript “CP” on N, is clearly selected to reflect this method 
of extending x(n). 

Figure 20.1 presents an OFDM signal sequence at the transmitter and its counterpart at 
the receiver. At the transmitter, a CP is added to each OFDM symbol, x(n). The CP acts 
as a guard interval that absorbs the channel impulse response. That is, channel transient 
is absorbed within the CP, leaving the last part of each OFDM symbol a summation of 
the subcarrier tones with a similar form to Eq. (20.3). More specifically, if we call the 
last N samples of the nth OFDM symbol at the receiver y(n), it can be expanded as 


N-1 
yin) =} yn) (20.6) 
k=0 


where i 
y(n) = H(a,) x week: fork = 0, 1,2,...,N— 1 (20.7) 


H (œw) is the channel frequency response, and w, = 27k/N is the frequency of the kth 
subcarrier. Note that this result follows as each tone after passing through channel and 
reaching its steady state is affected by a gain equal to the channel frequency response at 
the respective frequency. 

Considering Eq. (20.6), the data symbols s(n), for k =0,1,...,N — 1, can be 
recovered from y(n) by taking the DFT of y(n) and applying a set of single-tap 
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equalizers with the gains of 1/H(@,), fork = 0, 1,..., N — 1, to the respective outputs. 
It is this trivial method of equalization that has made OFDM so popular. This method of 
equalization is often referred to as frequency domain equalization. 

Summarizing the above result, an OFDM communication system is constructed as 
follows. At the transmitter each OFDM symbol is constructed by taking the IDFT of a 
vector consisting of N data symbols, and appending a CP to its beginning. The generated 
OFDM symbols are sequentially transmitted through the channel. At the receiver, symbol 
boundaries of each OFDM symbol are identified and the CP part of it is removed. The 
CP-stripped OFDM symbols are then passed through a DFT and the DFT outputs are 
equalized using the single-tap values w, = 1/H (œ), fork = 0, 1,..., N — 1. These give 
a set of estimates of the transmitted data symbols. Clearly, in practice, the IDFT and DFT 
operations are implemented using the computationally efficient IFFT and FFT algorithms. 

In the above derivation, we ignored the channel noise whose presence, inevitably, 
results in some inaccuracy in the transmitted symbol estimates. So, at the receiver, only 
noisy estimates s,(n) of s(n) are obtained. We also assumed that the receiver was aware 
of the exact carrier frequency in the received signal, thus, the received OFDM symbols 
y(n) were free of any carrier offset. Moreover, we assumed that the channel had a finite 
duration shorter than or equal to the length of CP, and the channel A(n) was known at the 
receiver, thus, perfect equalization was possible. Furthermore, it was assumed that the CP 
position could be identified by the receiver. Clearly, none of these tasks can be perfectly 
established at any receiver. In the rest of this section, we discuss a few basic concepts that 
are commonly used to obtain some estimates of the above information from the received 
signal y(n), hence, perform the tasks of, carrier recovery, timing synchronization, and 
channel equalization. 

To facilitate the tasks of carrier recovery, timing synchronization, and channel equal- 
ization, it has become a standard practice to partition the transmitted information into 
segments of several hundred or several thousand bytes and transmit one segment at a 
time. These segments are referred to as packets. Each packet is preambled with some 
training symbols to synchronize the receiver with the incoming packet and also to set up 
the frequency domain equalizer coefficients. For longer packets, additional pilot symbols 
are added in between the data symbols for tracking possible variations of channel, as well 
as carrier and timing drifts. 


20.1.2 Packet Structure 


Here, as a typical example, we present the packet structure proposed in the IEEE 802.1la 
standard and discuss how it is used to take care of carrier recovery, timing synchronization, 
and channel equalization. Figure 20.2 presents this packet structure. It begins with a 
preamble consisting of 10 short training symbols and 2 long training symbols. A signal 
field symbol and OFDM data symbols (the payload) then follow. 

In 802.11a, the IFFT/FFT length is equal to 64. Therefore, the maximum number of 
subcarriers is also 64. From these, only 52 subcarriers carry data/training symbols. They 
are subcarrier numbers from —26 to —1 and from 1 to 26. The rest of the subcarriers, 
including the Oth subcarrier, are set equal to 0. 

The short training, effectively, is a collection of 12 tones that are located at the center 
of the subcarriers S = {—24, —20, —16, —12, —8, —4, 4, 8, 12, 16, 20, 24}. The symbol 
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Figure 20.2 Packet structure of IEEE 802.1 1a. 


Table 20.1 The symbol values for 
short training symbols in IEEE 802.1 1a. 


k Xi, 
—24 Baj) 
-20 —~f/Bata 
-16 Baty 
-y se 
-8 -/Ba+p 
—4 Ba +) 
4 a ears, 
8 =J te) 
12 Baap 
16 Bat if) 
20 Bq +i) 
24 Sys) 


values, X (k), corresponding to these tones are listed in Table 20.1. The short symbols 
are thus generated according to the equation 


X short t) = Sshort (t) 5 Ker (20.8) 
keS 


where 


1, O<t< 8us 
Bshon (1) 0, otherwise a) 


and A, is the spacing between the adjacent subcarriers. In 802.11a, if all 64 subcarriers 


were used, the total bandwidth would be 20 MHz. Hence, the subcarrier spacing is a = 
0.3125 MHz. Substituting this value in Eq. (20.8), one finds that x,,,,,(¢) over the interval 


0 < t < 8us is periodic and repeats 10 times. It is also worth noting that the factor ,/ 2 in 
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the symbol values listed in Table 20.1 is to equalize the power of short training symbols 
to a level equal to that of the rest of the symbols in the packet. 

The long training consists of two full-length OFDM symbols (3.2 us each) with an 
extended CP of length of 1.6 us. This is twice the length of the CP in the succeeding 
OFDM symbols. The long training symbols are generated using the following equation 


26 


Xionglt) = Zingt) J Are (20.10) 
k=—26 


where 
1, 8<t< lous 


: (20.11) 
0, otherwise 


Stong(t) = 


and the frequency domain symbol values X, (k) for —26 < k < 26 are given by 
{X (k)} = {1, I; =] — 1, 1, 1, =1, 1; =l, 1, 1, i; L, I; 1, =l, —1, 1; l; —1, 
1,—1, 1, 1, 1, 1, 0, 1, —1, —1, 1, 1, —1, 
1, 1, 1, 1; l; 1, 1, 1,1, L, 1, jP I; 1; 1, 1, l; Í, i, 1} 


Note that at DC, that is, for k = 0, X,(k) = 0. The preamble is formed by appending 
X short (É) and Xiong (t). 

The signal field is an OFDM symbol that carries information about the rest of the 
packet. The information content of the signal field includes the number of information 
bits in the packet and the data rate. 


20.1.3 Carrier Acquisition 


Recall the carrier acquisition method that was introduced in Section 17.6.4 in relation to 
cyclic equalizer. Because the preamble (both short training and long training) is periodic, 
in IEEE 802.11a, similar equations to Eq. (17.106) through Eq. (17.108) are applicable. 
The receiver should first identify the short training by looking for a periodic signal with 
period of 0.8 ms. It should then remove the transient part of the short training, sim- 
ply by ignoring the first cycle of it, and apply Eq. (17.108) to the rest of its cycles to 
obtain an estimate of the carrier offset. The identified carrier offset is applied to cor- 
rect the rest of the packet. This process is called coarse carrier acquisition. It brings 
the carrier offset to a value smaller than one-half of the carrier spacing in the OFDM 
data symbols. 

Further tuning of the carrier is performed using the long training. Here, first, the timing 
acquisition method that is discussed below identifies the boundaries of the long training 
symbols 7, and T, (Figure 20.2). Then, as T) and T, form two periods of the same 
signal, Eq. (17.108) is applied to obtain an estimate of the residual carrier offset. This 
is referred to as fine carrier acquisition, because of obvious reasons. This method of 
using periodic sequences to identify the carrier frequency offset was first introduced by 
Moose (1994) for OFDM systems. It thus often referred to as Moose’s carrier acquisi- 
tion method. 
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20.1.4 Timing Acquisition 


Given a packet structure similar to that of IEEE 802.11a, the following algorithm may 
be used to find the boundaries of the training symbols 7, and T). Once these are found, 
the rest of the symbols in the packet follow simply by keeping track of the signal sample 
indices. 

The timing acquisition method discussed here was first introduced in Schmid! and Cox 
(1997). Note that the long training consist of two similar length 64 sequences that are 
preappended by a length 32 cyclic prefix. Accordingly, the autocorrelation function of the 
received signal y(n), viz., 


N-1 


1 
ryy(n) = x > y(n +k)y*(n+ N +k) (20.12) 
k=0 


will find its peak when n is over the interval where y(n + k) = y(n + N + k). This 
is when the above summation is run over the terms that belong to the long training 
part of the preamble. The magnitude of r, (n) begins to drop as soon as the last term 
of y(n + N + k), that is, y(n + 2N — 1), drops out of the range of the long training 
symbol T,. Accordingly, the following algorithm may be used to obtain the samples at 
the boundaries of OFDM symbols within a received packet: 


Explore |r,,(n)| for successive values of n. 


e When n is at the beginning of the long training, |r,,,(7)| should reach a peak and stay 
flat for 32 consecutive values. It will then begin to drop. 

Identify the value of n for which |r,,(7)| begins to drop. Add 2N = 128 to this value 
of n to obtain the first sample of the signal field OFDM symbol. 


e The succeeding data symbols begin at a period of N + Nep = 80 samples. 


To help the reader to examine the above concept, as well as the equalization method 
that is introduced below, a MATLAB script, named “OFDM .m,” is provided on the accom- 
panying website. This code provides a skeleton for generation of a packet of OFDM data 
sequence and the corresponding preamble. Adding additional lines to this code to obtain 
r,,(n) and plotting a sample result, we obtain Figure 20.3. It may come as a surprise to 
the reader that there are two similar top flats in Figure 20.3. This will not be a surprise, 
if one notes that the short training part of the preamble that has a period of 16 samples 
may be also thought as a periodic sequence with any integer multiple of 16, which of 
course will include the period of 64. The reader may also note that even though the above 
algorithm suggested the use of the fourth corner of the plot of |r, (n)| as the reference 
point for locating the OFDM symbol boundaries, any of the corner points of |r, (n)| may 
be used. Or one may choose to exploit all four corner points of the plot to obtain a more 
accurate reference for locating the symbol boundaries. 


20.1.5 Channel Estimation and Frequency Domain Equalization 


Once the OFDM symbols boundary points are identified and the received signal samples 
are frequency compensated, one may take either of the training symbols T, and T, or their 
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Figure 20.3 A sample plot of |r, (n)|. Points that may be used for timing acquisition are the top 
corners of the trapezoidal shapes on the plot. Their positions are marked as the time indices n,, 
Ny, nz, and ny. 


average, and find a channel response h = [hoh] ---,_,]' of a finite length L that when 
convolved with the long training best matches the received signal. The result is referred to 
as h. Note that in common practice L is set equal to Nep and Nep < N. In the sequel, we 
discuss two approaches of finding h. Once h is obtained, the frequency domain equalizer 
coefficients are calculated by taking the N point DFT of h (after appending zeros to its 
end to expand it to the length of N) to obtain the channel frequency response H (@,), for 
k =0,1,...,N—1, and set w, = 1/H(a,). 


Maximum Likelihood (ML) Channel Estimator 


The ML channel estimator begins with the following channel model, which is inferred 
from the OFDM principle presented in Section 20.1.1: 


z=SBh+ v (20.13) 


Here, S is the diagonal matrix containing the training symbols at its diagonal. Hence, 
assuming that there are M training symbols, S is a diagonal matrix of size M x M. When 
Eq. (20.13) is used in connection to the long training symbols in IEEE 802.1la, M = 52 
and diagonal elements of S are those listed under {X,(k)} above. The matrix B in Eq. 
(20.13) is to transform h to the channel frequency domain gains H(q,). It, thus, is of 
size M x L and its mlith element is given by 


b= e7}? kml /N (20.14) 


ml 


where k,, is the index corresponding the mth training symbol. For the IEEE 802.11a, km 
takes the values of —26 to 26, excluding 0. The vectors v and z, both of size M x 1, 
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contain the elements of the DFT of channel noise samples and the received signal samples 
that correspond to the training symbols. Also, the reader should note that we have dropped 
the time index n for convenience. 
Given the vector z, the ML channel estimator selects the estimate of h by minimizing 
the cost function 
Ev, = liz — SBh||? (20.15) 


where ||- || denotes the norm of a vector, that is, for a vector a, ||al| = vaĦa. Under the 
condition that the elements of v are a set of independent identically distributed Gaussian 
random variables, minimization of &ņ;, results in the most likely value of h, that is, the 
choice that maximizes the a posteriori probability p(h|z). Hence, the name ML channel 
estimator is to reflect this fact. 

Minimization of the cost function £y; is that of a least-squares problem. When L < M, 
it is an underdetermined problem, that is, there are more parameters (the number elements 
of h) than the number of sample points (the number of elements of z). In this case, one 
or more solutions to h can be found to match the vectors z and SBh perfectly. 

A case of more interest in practice is when M > L. In this case, the problem is that 
of a overdetermined problem, and following the procedure presented in Chapter 12, one 
finds the following estimate minimizes yy: 


h = (B"s"sB)'BUS"z (20.16) 


Assuming that the training symbols are from a PSK constellation with the magnitude of 
unity, SS = I and Eq. (20.16) reduces to 


h = (B"B) 'B'S#z (20.17) 


Linear Minimum Mean-Squared Error (LMMSE) Channel Estimator 


The LMMSE channel estimator finds an estimate of h according to the equation 
h = W#"z (20.18) 


where 
W= [WoW, - Wry] (20.19) 


is the estimator coefficient matrix, and Wọ through w,_, are column vectors of size M. 
The coefficient matrix is found by minimizing the cost function 


Ecmmse = Elllh — hll*] (20.20) 


Substituting Eqs. (20.18) and (20.19) in Eq. (20.20), and rearranging the result, we obtain 


L-1 
Emmse = >, Ellw)'z — h)/7] (20.21) 
1=0 


Furthermore, we note that the problem of minimization é mmsg can be broken into L 
simpler problems of minimization of the individual terms under the summation on the 
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right-hand side of Eq. (20.21). Moreover, each of the latter problems has the familiar 
form that was first presented in Chapter 3 and has the solution 


w,,=R;'p, fori=0,1,...,L—1 (20.22) 


where, here, R, = E [zz] and p; = E[zh;]. Combining the results in Eq. (20.22), we 
get 
W, = R;'P (20.23) 


where 


P = [Po Pi +t Pz] 
= E[zh"] (20.24) 


To develop some insight to the properties of the LMMSE channel estimator, we recall 
that in Eq. (20.13) the matrices S and B are deterministic, and the vectors h and v are 
unknown random variables. Hence, 

R.. = SBR,,,B"S" + R,, (20.25) 


where R,,, = E[hh"] and we have defined R,, = E[vv"]. Also, assuming that h and v 
are independent random vectors, 
P =SBR,, (20.26) 


From the results presented in Eqs. (20.23), (20.25), (20.26), and (20.16) (or Eq. (20.17)), 
we make the following observations: 


e While the ML channel estimator does not take into account the statistics of h, the 
LMMSE channel estimator optimization makes use of such statistics. 

e The ML channel estimator makes the assumption that the elements of v are Gaussian 
and independent of one another but does not care about their variance. The LMMSE 
channel estimator, on the other hand, does not require that the elements of v be Gaussian 
and/or independent of one another, but makes use of the correlation among the elements 
of v to improve on the accuracy of the estimated channel. 


Clearly, when the channel and noise statistics, R}, and R,,,, respectively, are known, 
as the LMMSE estimator makes use of them, it naturally performs superior to the ML 
estimator that ignores these information. 


20.1.6 Estimation of Ry, and R,, 


In practice, on the basis of empirical measurements, channel models are built and accord- 
ingly R}, is precalculated. Additional adjustment to R,, may also be made if SNR at the 
receiver could be measured. 

In the case of IEEE 802.1 1a, its packet format allows the following trivial method of 
estimating R,,,,. Consider the channel model (20.13) and let 


z(n) = SBh + v(n), forn=1,2 (20.27) 
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denote the two long training symbols of a received data packet. Note that S is not time 
indexed, because the same symbols are transmitted for both n = 1 and n = 2. Next, if 
we define the measurable vector 


u = z(1) — z(2) (20.28) 


Equation (20.27) implies that 
u = v(1) — v(2) (20.29) 


Assuming that v(1) and v(2) are independent vectors, computation of the correlation 
coefficients of the elements of u leads to an estimated correlation matrix R„, which will 
be twice the respective estimate R,,, of R,,,. Hence, to obtain an estimate of R,,, one 


may use Eq. (20.28) to calculate u. This is then used to obtain an estimate R,,, of R, 
and then i 
R, = Ruu (20.30) 


is calculated. 


20.1.7 Carrier-Tracking Methods 


Most standards have adopted the use of pilot symbols for carrier tracking. For instance, 
in IEEE 802.11a, the subcarrier numbers 7, 27, 43, and 57 are continuously filled in 
with pilot symbols that are known to the receiver. It has also been noted that the unused 
Oth subcarrier and the unused subcarriers at the two sides of the OFDM band may be 
treated as subcarriers with known null transmit symbols. These are commonly referred to 
as virtual subcarriers. In the section “pilot-aided carrier-tracking algorithms,” below, we 
treat pilot and virtual subcarriers similarly and present a related carrier-tracking algorithm. 

There are also several other carrier-tracking algorithms that make use of the redundancy 
introduced by CP. These fall under the general class of blind carrier-tracking methods. 
Examples of algorithms related to these methods are presented in the section “blind 
carrier-tracking algorithms.” 


Pilot-Aided Carrier-Tracking Algorithms 


In the absence of carrier phase and frequency offset, the nth cyclic prefix striped received 
OFDM symbol is given by Eq. (20.6). When there is a phase offset øọ and a carrier 
frequency offset Aw,, the nth cyclic prefix striped received OFDM symbol is obtained as 


N-1 He) 
y(n) = e/&(Aw,) >, k s (nf, + v(n) (20.31) 
k=0 
where v(n) is the channel noise vector and 
1 0 0 > 0 
0 ede 0 0 
e/2Awe alas 0 


(Aw) =|9 9 (20.32) 
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To proceed, we note that the phase offset factor e/* is a constant that can be absorbed 
in the single-tap equalizers tap weights. Hence, we remove the factor from the right-hand 
side of Eq. (20.31) for the rest of our discussions. With this modification, Eq. (20.31) 
reduces to 
H(a,) 

N 


N-1 
y(n) = &(Ae,) ” s, (nf, + v(n) (20.33) 
k=0 


We note that ®(Aqw,) is an undesired factor that has been introduced as a result of the 
frequency offset. Moreover, premultiplying y(n) by ®*(Aqw,) = ®(—Aa,) will result in 
the carrier offset free vector 


y(n) = &(—Aa,)y(n) 


N-I y (ai) 
= 5 K's, f, + ®(—Aw,)v(n) (20.34) 
N 
k=0 
To develop an adaptive algorithm for correcting the carrier frequency offset, we define 
the vector 


y(n) = ®(p)y(n) (20.35) 
and search for the value of g that minimizes the cost function 
» 2 
JQ) = JOE [leon — H(a,)s,(n)| | (20.36) 
keP 


where P is the set of subcarrier numbers of the pilot symbols. An LMS-type algorithm 
can now be implemented by replacing J(g) with its coarse estimate 


J(g) = >> iOi H (ops (n)? (20.37) 


keP 
and running the gradient update equation 


aJ(g) 
on+1)=¢n)- u va (20.38) 
Replacing Eq. (20.37) in Eq. (20.38) and noting that 
a® 
a JAS (9) (20.39) 
ag 
where 
000 0 
010 0 
ka | 02 0 
000... (N-1) 
we obtain 
gin+ 1) =9(n)— 2 N [if aša (e y(n) — Hops) } (20.40) 
keP 


The update equation (20.40) can be improved further by adding the unused subcarriers 
to the list of pilots. Here, the pilot symbols are equal to zero. As mentioned above, 
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the unused subcarriers are called virtual subcarriers and, accordingly, the related carrier 
recovery algorithms are referred to as virtual subcarriers-based carrier recovery methods. 
Note that even if there are no pilot symbols, the virtual/null subcarriers alone may be 
used for carrier tracking. In that case, Eq. (20.40) converts to 


gn tl) =e(n)— WUY N [j if! AY (n) (EPH (n))" | (20.41) 
keV 
where Y is the set of subcarrier numbers of the virtual subcarriers. Clearly, the update 
equations (20.40) and (20.41) may also be combined. 

We note that an implementation of Eq. (20.40) requires the knowledge of the channel 
coefficients H (œ,) that should be replaced by their estimates. On the other hand, if we 
limit ourselves to the virtual subcarriers, where Eq. (20.41) is applicable, the channel 
knowledge is not required. Yet, there are other methods that can be used to track the 
carrier frequency without even using the knowledge of the position of pilot or virtual 
subcarriers. These methods are referred to as blind carrier-tracking algorithms. 


Blind Carrier-Tracking Algorithms 


There are several blind carrier-tracking algorithms in the literature. In this section, we 
present one of them. The presented algorithm takes note of the fact that the addition of 
cyclic prefix to OFDM symbols add a regular variation (ripple) to its spectrum that may be 
deployed to develop a simple carrier-tracking algorithm. The blind carrier-tracking method 
that is presented here is due to Talbot (2008); also see Talbot and Farhang-Boroujeny 
(2008). 


OFDM spectrum 

As discussed earlier, an OFDM signal is constructed by concatenating a set of cyclic 
prefixed symbols. Let us use x°P4(n) to denote the nth cyclic prefixed symbol of the 
OFDM signal x(n). Following similar presentations to those of Eq. (20.3) through Eq. 
(20.5), here, we have 


xP (n) = 2 x(n) (20.42) 
where 
xP"(n) = sete, fork =0,1,2,...,N—1 (20.43) 
and 
eJ RE x(N—Nep) 


jE 1) 


1 


i" = a ark x1 (20.44) 


e 


ei x2 


ah «(N 1) 
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We are interested in evaluating the power spectral density of x(n). To this end, we first 
assume that s (n)’s are independent across the frequency axis (the index k), and note that 
this implies 

(0) = D> Punu O) (20.45) 
keK 


where K is the set of subcarrier indices of the active (data and pilot) subcarriers, ® (œ), 


following the notations in Chapter 2, denotes the PSD of x(n), and, similarly, ®,,,,(@) 
denotes the PSD of the sequence x,(n) obtained by concatenating the xP"(n)’s. 
Next, to evaluate ®, x, (œ), we note that 
1 oo 
el) = E J Ogi — UN + Neg) oF O HN ARNON (20.46) 
l=—00 


where g(n) is referred to as symbol shaping window function. It is a rectangular window 
that confines the duration of each OFDM symbol to N + N,,, samples. More specifically, 


; (20.47) 
0, otherwise 


1, -N..<n<N-1 
Equation (20.46) implies that x(n) is obtained by passing the symbol sequence s; (7) 
through the system shown in Figure 20.4. 
Following Figure 20.4, the PSD of x(n) is obtained as 


2 

oO 
P, n G 20.4 
ven @) = (N + No) N? (e j ee 


where o is the variance of s,(n) and G(q) is the Fourier transform of g(n). In Eq. 
(20.48), the frequency shift of £ in G(e/ 0-5) is a consequence of the modulating 


factor e/ W O-Ne) behind g(n) in Figure 20.4, and the factor 1/N? arises because of the 
factor 1/N on the front of g(n). The additional factor 1/(N + N, ond is included because 
there is only one s,(n) for every N + Np samples of x;,(1), hence, its energy averages 
out over N + Np samples. 

Substituting Eq. (20.48) in Eq. (20.45), we obtain 


27k 


Palo) = NEN arge 2 [oe] (20.49) 


SO) It N+ Nop T ofna oN E 


Figure 20.4 The process of generating x,(n) from s(n). 
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Figure 20.5 The impact of the cyclic prefix length on the variation of the width of main lobe of 
G(a). 


Also, it is straightforward to show that for the rectangular window (20.47) (see 
Problem P20.5, at the end of this chapter) 


N+Nep—1 


IG)? = X (N+ Ny — |n|) cos(ne) (20.50) 
n=—(N+Nep—1) 


Figure 20.5 presents a set of plots of |G(w)|*, normalized to the maximum amplitude 
of unity. The parameter N is set equal to 16 and Np (the length of CP) is given the 
values of 0, 4, and 16. We note that for a fixed N, as Nop increases, the width of the main 
lobe of |G(w)|* decreases. On the other hand, the subcarrier spacing remains constant at 
1/N. Hence, Eq. (20.49) implies that, for a fixed N, increasing the length of CP induces 
ripples with larger amplitudes in the in-band region of OFDM PSD. This is shown in 
Figure 20.6 for the case where N = 16, there are 10 active subcarriers at frequencies 
f= +4, +4, er +3, and Nop is given the values of 0, 4, and 16. The magnitude of 
different spectra are also normalized to the maximum of unity to allow a quantitative 
comparison. 


Impact of Carrier Offset 

The spectral effect of a carrier offset is, obviously, a shift of the signal spectrum. For 
example, if the OFDM signal corresponding to the PSD for the case of Nop = 4 in 
Figure 20.6 experiences a carrier offset of A f, = Aw,/2m = 0.375/N, then the nor- 
malized OFDM PSD will be the one shown in Figure 20.7. As the spectral shift is 
directly related to the size of carrier offset, A f,, a measurement of the spectral shift is a 
measurement of the carrier offset. 
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Figure 20.7 OFDM PSD for a frequency offset A f, = 0.375/N; N = 16, there are 10 active 
subcarriers, and Nop =A, 


Locating the Spectral Peaks 

To locate the spectral peaks in the OFDM PSD, we adopt the use of a comb filter F(z) in 
which comb-spacing is equal to the subcarrier spacing and filter nulls are placed halfway 
between the subcarrier locations. The normalized frequency response of one such filter, 
corresponding to a subcarrier spacing 1/16, that is, when N = 16, is shown in Figure 20.8. 
Suppose we shift the nulls of this filter by an amount €, and at the same time preserve 


OFDM and MIMO Communications 727 
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Figure 20.8 An example of the frequency response of the comb filter F(z). 


the spacing between the nulls. This is equivalent to circularly shifting the filter response 
over the normalized frequency range 0 to 1 (or, equivalently, the range —0.5 to 0.5). 

We pass the demodulated received signal y(n) through the comb filter F(z) and call its 
output y,(n). If we measure the power of y¢(n) as a function of €, we will obtain a plot 
similar to the one shown in Figure 20.9. We can see that the power of the output consists 
of many peaks and troughs as we vary € across the signal bandwidth. However, if we 
choose to look at the portion of the plot where € is in the range +0.5 of the subcarrier 
spacing around the true carrier offset, that is, — zy <eE< we will find that the power 
of y(n) will have only one maximum. 

With this in mind, we choose to adapt F(z) to maximize the cost function 


L 
2N’ 


E(n, €) = E[ly-(n)|7] (20.51) 


Obviously, this maximization will be achieved when the passbands of the comb filter 
F(z) align with the OFDM PSD, and the total shift € of the filter is an estimate of the 
carrier offset. It should be also noted that the proposed tracking algorithm relies on the 
ability of an acquisition algorithm to provide a carrier offset estimate within one-half of 
the subcarrier spacing. 


The Comb Filter 
One possibility for the comb filter is to choose an FIR filter whose zeros are on the 
unit-circle exactly halfway between all subcarriers as shown in Figure 20.10a and b. The 
placement of the data and null subcarriers are also shown for convenience. 

Figure 20.10a corresponds to the case where there is no carrier offset. In this case, 
the zeros that are aligned with the subcarriers are located at 


z, = ef PTHINAR/N) = fork =0,1,...,N—1 (20.52) 
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Figure 20.9 Variation of the power of y¢(n) as € varies. 
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Figure 20.10 Placement of zeros of the comb filter F(z): (a) nonrotated filter zeros (9 = 0), and 
(b) rotated filter zeros. 


These correspond to the transfer function 


N-1 
Fæ = [| [t-te 
k=0 


=1+z™ (20.53) 


Figure 20.10b, on the other hand, corresponds to the case where there is shift of @ in 
the spectrum of the OFDM signal. This corresponds to a carrier offset 


6 
Af.=— (20.54) 
27 
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To align F(z) with the shifted spectrum, its zeros should be modified as 
Zz, = eJ OF2TKINGTIN) fork =0,1,...,N—1 (20.55) 


that is, the zeros of F(z) should be rotated by the angle 6, over the unit-circle |z| = 1, 
as shown in Figure 20.10b. This leads to the transfer function 


N-1 
F(z) = I] [i- el O+2ni/N+x/N) 1] 
i=0 
=14 (20.56) 


Recalling the relationship € = Af, we obtain from Eq. (20.54), 
0 = 27€ (20.57) 
Using Eq. (20.57), we can describe the filter F(z) as 
F(z) = 1 4 e?ne N (20.58) 


In Figure 20.10, one may note that the zeros are placed around null subcarriers as well 
as data subcarriers even though there is little signal power to track at locations near the 
null subcarriers. If zeros are placed only around data subcarriers then Eq. (20.58) will be 
more complicated. The uniform placement of zeros around the unit-circle is thus chosen 
to have the simple expression of Eq. (20.58). 


Adaptation of the Comb Filter (Carrier Offset Tracking) 
With F(z) in place, it is now possible to derive update equations for carrier tracking. The 
transfer function F(z) can be written as 

raa B21 4 gi NEN (20.59) 

Y(z) 

where Y(z) and Y;(z) are the z-transforms of the received signal y(n) (i.e., the received 
OFDM signal after demodulation, but before removing the carrier offset), and the comb fil- 
ter output y¢(n), respectively. If we rearrange Eq. (20.59) and take the inverse z-transform, 
we find that 

y(n) = y(n) + e7" y(n — N) (20.60) 


On the other hand, for the development of an LMS-type algorithm, we remove the 
expectation from the right-hand side of Eq. (20.51) and accordingly define 


E(n, €) = |y¢(n) |? (20.61) 


Taking the derivative of the cost function Ê (n, €) with respect to € gives 


dEn, e) yE |g 3y) 
Je = y(n) ae +y) JE 
= 297 | ye(n) ot) | (20.62) 
gE 
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where R{-} denotes the real part of the argument and * denotes the complex conjugate. 
Using Eq. (20.62), the update equation for € becomes 


(20.63) 


4a N° de 


dy; (n) 
e(n + 1) =e) +2 8 {vn Ef 
where ju is the step-size parameter. We have divided the step-size u by 47 N so as to 
simplify the final expression for the update equation. 
The derivative of y(n) with respect to € is 


ĝ f 
nw — j2n Nei?" y(n —N) (20.64) 


Substituting Eqs. (20.60) and (20.64) in Eq. (20.63), we get the final form of the update 
equation as 


e(n+1)=e(n)+ ps {e PNM ynyy*(n — N)} (20.65) 


where {-} denotes the imaginary part of the argument. 


20.1.8 Channel-Tracking Methods 


The ML and LMMSE channel estimation/equalization methods that were introduced in 
Section 20.1.5 for initial estimation of the channel and setting of the frequency domain 
equalizers can also be extended to tracking the channel variation and accordingly adjust- 
ing the frequency domain equalizer during the payload periods. For this purpose, pilot 
symbols are spread among data symbols. A few examples exemplifying different methods 
of insertion of pilots among data symbols is presented in Figure 20.11. 


20.2 MIMO Communication Systems 


Communication systems that use multiple antennas at both transmitter and receiver are 
known as MIMO. A MIMO system with N, transmit antennas and N, receive antennas 
connects the transmitter and receiver through N,N, wireless links. One can benefit from 
these links by adopting two different approaches. The first approach transmits an indepen- 
dent stream of data from each antenna. This approach results in a significant increase in 
the system throughput, often called space-multiplexing gain. The second approach trans- 
mits correlated streams of data from two or more of the transmit antennas and uses proper 
combining techniques at the receiver to improve on the link reliability. This approach does 
not increase the system throughput, but improves the quality of the channel by avoiding 
deep fades in the combined channel. The result, thus, is a gain in quality of channel. This 
is known as space-diversity gain. The combination of the two approaches is also possible. 

In this section, we present a brief review of the various signaling methods that are 
used to benefit from either or both the space-multiplexing and space-diversity gains. We 
also discuss the various MIMO detection methods that are used to extract the space- 
multiplexed data symbols that are mixed together after passing through the channel. We 
note that such detectors require an estimate of the channel response between all pairs 
of transmit and receive antennas. We, thus, also introduce and review a few approaches 
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Figure 20.11 Common methods of pilot placement: (a) block, (b) comb, (c) rectangular, (d) 
parallelogram, and (e) hexagonal. 


that may be used for channel estimation. All the developments in this section assume 
the channel gain between each pair of transmit and receiver antenna is a flat gain. This 
is a reasonable assumption, as OFDM with cyclic prefix can be (and is usually) used to 
flatten the gains over each subcarrier channel. The MIMO-OFDM systems are discussed 
in Section 20.3. 
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20.2.1 MIMO Channel Model 


Figure 20.12 depicts an example of a MIMO communication system with N, transmit 
antennas and N, receive antennas. The reader may note that the basic blocks, including 
the transmit pulse-shaping, pr(t), modulation and demodulation stages, and the receive 
matched filtering, pp(t), are similar to those in the case of single antenna systems. 
Here, we consider the case where the MIMO communication channel is flat fad- 
ing, hence, the link between each pair of transmit and receive antenna is characterized 
by a complex-valued gain. Thus, the equivalent discrete-time baseband channel that 
relates the transmitted symbols and the received signal samples, after matched filtering, 
is expressed as 
y(n) = Cs(n) + v(n) (20.66) 


where s(n) is a column vector of the transmitted data symbols sọ(n), sı (n), ..-, S-10) 


at time instant n, y(n) = [yoy m) --- yy,-1@)]" is the vector of the received signal 
samples, v(m) is the vector of noise samples after matched filtering, and 


C0,0 Cor oct COM 1 
C10 Ci ot C1 N;—1 
C= . ; . (20.67) 
CN,—1,0 ©Np—-1,1 “°° ©N-1,M—1 


is the channel gain matrix with its ijth element denoting the gain between the jth transmit 
antenna and the ith receive antenna. We assume that the elements of v(m) are a set of 
independent Gaussian random variables each with variance of o?. Hence, 


Efv(n)v4(n)] = 071 (20.68) 


where I is the N, x N, identity matrix. 


20.2.2 Transmission Techniques for Space-Diversity Gain 


In a communication system with single transmit and single receive antenna, Eq. (20.66) 
reduces to 
yn) = cs(n) + v(m) (20.69) 


where c is the channel gain (a scalar), s(n) is the transmitted symbol, and v(n) is the 
channel noise. To obtain an estimate of s(n), one may premultiply Eq. (20.69) by c*/|c|* 
to obtain 


c* 


Ic]? 


s(n) = y(n) 
= Sty gnu”) (20.70) 


One may note that s(7) is a noisy estimate of s(n) with the additive noise component 
* . n ë 
ppr. Noting that the latter has variance of o7/|c|?, one will find that the SNR of the 
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Figure 20.13 A communication channel with one-transmit antenna and two-receive antennas. 


estimated symbol s(n) as 

2 
(os 
o 


SNR = |c]? (20.71) 


where o? and a? are the variances of s(n) and v(n), respectively. When |c|? is small, 
that is, when channel is in a deep fade, SNR will be small and this may lead to a poor 
detection of the transmitted information. 

Next consider a communication system with one-transmit antenna and two-receive 
antennas, as depicted in Figure 20.13. For this setup, Eq. (20.66) reduces to 


Oe coe Pada as (20.72) 
y(n) Cy v, (1) 


where vọ(n) and v; (n) are additive noise samples at the two antennas. We assume that 
vo(n) and v,(n) are a pair of zero-mean, Gaussian, independent, and identically distributed 
variables. This implies that the vector 


v(n) = voln) (20.73) 
vı (n) 
has the covariance matrix 
E[v(n)v"(n)] = oĉI (20.74) 


where ø is the variance of each of the elements of v and I is the 2 x 2 identity matrix. 

As the receiver task is to obtain the best estimate of s(n) from the received signal 
samples of yọ(n) and y; (n), if we limit this estimate to be based on a linear combina- 
tion of yọ(n) and y(n) (i.e., be a linear estimator), one needs to solve the following 
problem. 


: w i ; 
Find the vector w = a that results in the signal estimate 
1 


5(n) = wl y(n) = s(n) + v(n) (20.75) 


with the maximum SNR. 
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The identity (20.75) implies that 
wic= 1 (20.76) 


where c = Hi Equation (20.76) may be thought as a linear constraint on wọ and wy. 


1 
Moreover, one may note that 
v(n) = wily (20.77) 


and to maximize SNR in Eq. (20.75), one should minimize the cost function 
E[|v(n) 7] = w@E[v(n)v"(n)|w = 02 ww (20.78) 


where the second identity follows from Eq. (20.74). 
From the above results, we conclude that to maximize SNR in the estimator (20.75), 
one should choose to minimize the cost function 


é=w'w (20.79) 


subject to the constraint (20.76). Obviously, such minimization can be performed using 
the method of Lagrange multipliers, whose solution leads to 


l H (20.80) 


w= — 
leo? + ley? Lea 
Substituting Eqs. (20.80) and (20.72) in Eq. (20.75), we obtain 


CoYo(n) + cf v4 (n) 
legl? + le, |? 


Comparing Eqs. (20.70) and (20.81), one may find that while in the single-input single- 
output case, the system will suffer from the channel fade when the gain c is small, a system 
with a single transmit and double receive antennas will only suffer if both gains cy and 
cı are simultaneously in deep fades. Obviously, the chance of the latter is much smaller 
than the former. We, thus say, introduction of the diversity in the channel, by introducing 
an additional link, leads to a more reliable communication link. This added reliability, as 
mentioned earlier, is called space-diversity gain. 

The setup presented in Figure 20.13 often happens when a mobile station transmits to 
a base station. The situation will be reverse when a base station transmits to a mobile 
station. This is presented in Figure 20.14. A question to ask here is how one can benefit 
from the diversity gain of this setup? In other words, can one design a scheme in which 
each transmit symbol goes through both the links cg and cı before reaching the receiver? 
It turns out that the answer to this question is positive and solution to it is provided 
by using code words that span across both time and space. Such code words are called 
space-time block codes. 

An important subset of space-time block codes that allow simple decoding (separation 
of the transmitted symbols at the receiver) are orthogonal space-time block (OSTB) 
codes. A popular OSTB that has been widely used in practice is the so-called Alamouti 
code (Alamouti, 1998). It is designed for MIMO systems with two transmit antennas and 
is defined by the code matrix 

= | 5o, ] (20.82) 


5(n) = s(n) + (20.81) 
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Figure 20.14 A communication channel with two transmit antennas and one receive antennas. 


Each row of this matrix indicates the pair of symbols that are transmitted from the transmit 
antennas. The same pair of symbols are transmitted in the next time slot after conjugation 
and possibly a sign change. Note that in this arrangement two symbols are transmitted in 
two time slots and this, clearly, is equivalent of transmitting one symbol per time slot. 
We also note that the code matrix S satisfies 


SĦS = (isl? + 15,7) 1 (20.83) 


where I is the identity matrix, hence the name OSTB code. 

Assuming that the pair of data symbols s(n) and s(n + 1) are transmitted according to 
the Alamouti code, the pair of the received signal samples received at the time slots n 
and n + | are given by 


yn) | _ s(n) s(n + 1)} | co v(n) 
We + | a E +1) s*(n) | f + fe + "| (20.84) 
To extract estimates of s(n) and s(n + 1) from Eq. (20.84), we first note that Eq. (20.84) 
can be rearranged as 


yr) |_| co a s(n) v(n) 
Ee + j = Ee a] E +D Tla (20.85) 
Defining 
= | ct d] (20.86) 
=c] “g 
1 
and multiplying Eq. (20.84) through from left-hand side by PN we obtain 
Co C] 


Pa J- 1 al y(n) | 
SaD] lota D*+ D 


cou(n) — cıv*(n + 1) 
Ieol? + le, |? 
= P a 4 (20.87) 
civ(n) + cov*(n + 1) 
lcol? + ley? 
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Comparing the first line of this result with Eq. (20.81), one will find that apart from a 
change in the phase of the gains, which has no consequence on the SNR values, one 
finds that the estimates of s(n) in both cases is the same. That is, the space-diversity gain 
achieved in the case of a one-transmit two-receive antennas systems is also achieved in the 
case where information is transmitted through two antennas and received by one antennas. 


20.2.3 Transmission Techniques and MIMO Detectors for 
Space-Multiplexing Gain 


Unlike, in the space-diversity, where correlated symbols are transmitted from all the 
transmit antennas, in space-multiplexing data streams transmitted from the transmitted 
antennas are in dependent of one another. When N, > N,, noise is absent, and the channel 
gain matrix C and the received signal y(n) are known, one can find the transmit symbol 
vector s(n) perfectly by solving the equation Cs(n) = y(n), if the rank of C is equal to or 
greater than N,. In presence of noise, a noisy estimate of s(n) will be obtained. The point 
to note here is that even though the set of symbol streams sọ(n) through sy,_;(”) are 
transmitted simultaneously and over the same band, they are detectable at the receiver, 
thanks to the use of multiple antennas at the receiver that will provide sufficient samples 
for the detection. In other words, the amount of information (considering that each symbol 
carries a certain amount of information) transmitted per unit of time and over a unit of 
bandwidth increases as N, increases, as long as there are sufficient antennas at the receiver. 
Here, without involving ourselves in theoretical details, we note that the capacity of a 
MIMO channel, that is, given a total transmit power P and a bandwidth B, the amount of 
information that could be reliably transmitted per unit of time, increases with the minimum 
of N, and N,. This property of MIMO systems is often referred to as capacity gain. 

In the rest of this section, we present a number of signal processing methods that 
attempt to estimate the transmitted symbols/information, given the received signal y(n) 
and the channel gain matrix C. The first three methods obtain various estimates of the 
vector s(n), directly. The last two methods are to generate soft values of the detected bits 
following the same principles that the soft equalizers of Chapter 17 were based on. 


Zero-Forcing Detector 


In zero-forcing (ZF) detector an estimate of s(n) is obtained as 
s(n) = C'y(n) (20.88) 
where CÏ is the pseudoinverse of C given by 
ci = (Cc) 'c# (20.89) 
Substituting Eqs. (20.66) and (20.89) in Eq. (20.88), we obtain 
8(n) = s(n) + (CC) vn) (20.90) 
This is a noisy estimate of s(n) with a noise vector 


v'(n) = (CHC) 'C# y(n) (20.91) 
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Obviously, the performance of the ZF detector is determined by the variance of the 
elements of v'(n) that is quantified by the diagonal elements of its covariance matrix. 
Using Eqs. (20.91) and (20.68), the covariance matrix of v'(n) is obtained as 


Elv )v'"(n)] = 02(C#C) | (20.92) 


This shows that the size of elements of E[v’ (nyv'"(n)] are related to the size of elements 


of the matrix (c"c) which, in turn, are related to the condition number, respectively, 
the spread of eigenvalues of the matrix CĦC. When one or more of eigenvalues of 
CHC are small, the size of elements of (CHC)! will be large and, hence, ZF detector 
performs poorly. More specifically, one may quantify the performance of the ZF detector 
by considering the SNR values at the detector outputs. These SNR values are defined as 


o2 


a (20.93) 


rA 
vk 


SNR; = 


where o? = E||s; (n)|?] and a = E[|v, (n)|7]. Note that while we assume we is the same 
k 
for the data symbols transmitted from different antennas, oF differs for various choices 
. j : À k 
of k and also it varies with the channel gain C. 


Numerical Examples: Part 1 


Some of the properties of the ZF detector could be best understood through numerical 
examples. Let us consider the following two choices of C. 


0.4209 + 0.0060; —0.3859 + 0.1032i 0.2284 — 0.0397i 
C, = | —0.0327 + 0.1392i 0.2426 — 0.0631i 0.4007 — 0.1766i 
—0.0280 + 0.1797i —0.0037 — 0.4032i —0.3683 — 0.0566i 


and 
0.0382 — 0.0424i —0.0099 — 0.2285i 0.1134 + 0.4595i 
C, = | 0.4010 + 0.06617 —0.0344 + 0.4154i 0.3406 + 0.13197 
0.2441 — 0.16351 —0.1571 + 0.31037 0.0105 + 0.20187 


The first choice is a poorly conditioned matrix; the eigenvalues of cic, are 0.0081, 
0.8843, and 1.1777. The second choice is a moderately conditioned matrix; the eigenvalues 
of C#C, are 0.0637, 0.2553, and 0.6810. 

Letting o? = 0.01, the SNR values at the detector outputs (calculated according to Eq. 
(20.93) for the two cases) are obtained as 


For C; : SNR = —5.46, —4.12, 1.18dB 
For C, : SNR = 9.92, 11.37, 14.42dB 


Clearly, as expected, the poorly conditioned channel gain C, leads to very low SNR 
values at the detector outputs. 
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Minimum Mean-Squared Error Detector 


The minimum mean-squared error (MMSE) detector finds an estimate §(n) of s(n) accord- 
ing to the formula 


S(n) = GĦy(n) (20.94) 
and set G by minimizing the cost function 
§ = Ele" (n)e(n)] (20.95) 
where 
e(n) = s(n) — S(n) (20.96) 


One, thus, may note that minimization of the cost function € leads to a MMSE solution, 
hence, the name MMSE detector. 
Minimization of € leads to the solution 
o2 N! 
G= (cc" + Zr) C (20.97) 
S 


An alternative form of this result is obtained by applying the matrix inversion lemma to 
Eq. (20.97). This leads to i 
Paan 
G! = Ge 4 Z1) g” (20.98) 
Og 
An interesting observation that can be made using Eq. (20.98) is that when cĉ = 0, 


GĦ = CË. That is, when there is no noise, MMSE and ZF detectors are the same. 
Substituting Eq. (20.98) in Eq. (20.94) and recalling Eq. (20.66), we obtain 


s(n) = Hs(n) + v' (n) (20.99) 
where 
o2 \ 7! 
H = (crc + Zr) CHC (20.100) 
Os 
and 
v (n) = GHy(n) (20.101) 


Using Eqs. (20.101) and (20.68), the covariance matrix of v'(n) is obtained as 
E[v (n) = (n)] = 0 2G"G (20.102) 


Unlike ZF detector in which interference among symbols was completely removed, in 
MMSE there is usually some residual ISI. Hence, to measure the performance of MMSE 
detector, one should evaluate signal-to-interference plus noise ratio (SINR). Recalling Eq. 
(20.99), one finds that at the Ath output of MMSE detector 


oF ny l? 


2 2 2 
os? izlhul? + oy 


SINR, = (20.103) 


where hy denotes the k/th element of H and o? is the kth diagonal element of the matrix 
k 
o?GĦG. 
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Numerical Examples: Part 2 


Following the above equations, the SINR values of MMSE detector for the channels C, 
and C,, of the “Numerical Example: Part 1,” are obtained as 


For C; : SINR = 0.71, 2.89, 8.79 dB 
For C, : SINR = 10.10, 11.60, 14.51 dB 


Comparing these numbers with their counterpart in ZF detector, one may conclude that 
while for moderately conditioned channels, the difference between ZF and MMSE detec- 
tors may not be significant, the two detectors perform significantly different when the 
channel is ill/poorly conditioned. 


Successive Interference Cancellation Detector 


From the numerical results presented above for channels C, and C,, one may observe that 
the performance of both ZF and MMSE detectors for different symbols in a data vector 
s(n) may vary significantly. For instance, for channel C,, the MMSE detector detects 
s2(n) with an SINR of 8.79 dB, s,(n) with an SINR 2.89dB, and sọ(n) with an SINR 
of 0.71 dB. With these values of SINR, one may argue that while s(n) may be detected 
correctly with a high probability, sı (n) and sọ(n) cannot be detected reliably because they 
are corrupted by a significant amount of noise/interference. This observation has led to 
the design of a class of detectors that operate by taking the following steps: 


1. Assuming that the channel gain matrix C and the noise variance o7 are known, design 
a ZF/MMSE detector. 

2. Detect the symbol s,(m) with the largest SNR/SINR. 

3. Subtract the effect of the detected symbol s(n) from the received signal y(n). This is 
equivalent of removing s(n) from s(n) and also removing the kth column of C. The 
new equation that relates the remaining elements of s(n) and the associated received 
signal will be 

yn) = Cs (n) + v(x) (20.104) 


where the subscript “—k” on s(n) means the kth element of s(n) is removed, on C 
means the kth column of C is removed, and on y(n) indicates that the effect of s,(n) 
has been removed. 

4. Consider Eq. (20.104) as the channel equation and repeat Steps 1, 2, and 3 until all 
the symbols are detected. 


This detector is called successive interference cancellation (SIC) detector, because of 
obvious reasons. We note that, here, since after detection of each symbol, the number of 
transmit symbols in the channel equation (20.104) reduces by 1, but the number of receive 
antennas remain unchanged, one would expect to see improvement in the detection of the 
remaining symbols. Of course, this improvement is granted only if the detected symbols 
are correct. An error in detection will result in error propagation and thus may lead to a 
poor performance of the SIC detector. 
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Numerical Examples: Part 3 


Following the above equations, the SINR values of SIC detector for the channels C} and 
C, of the above numerical examples are obtained as 


For C; : SINR = 13.63, 10.82, 8.79 dB 
For C, : SINR = 14.06, 12.16, 14.51 dB 


Comparing these numbers with their counterpart in ZF/MMSE detector, one observes a 
significant performance gain, particularly for C}. 


Soft MIMO Detectors 


The soft equalization methods (i.e., the soft MMSE equalizer and the statistical soft 
equalizer) that were presented in Section 17.9 can be easily extended to MIMO detectors 
as well. These methods are widely studied by many authors in the literature. Examples 
of the relevant literature are: (1) The soft MMSE-—MIMO detectors can be found in Abe 
and Matsumoto (2003) and Liu and Tian (2004). (ii) The MCMC—MIMO detectors that 
follow the statistical soft equalization method that was discussed in Chapter 17 can be 
found in Farhang-Boroujeny, Zhu, and Shi (2006) and Chen et al. (2010). Yet, another 
class of soft MIMO detectors has been proposed using the method of sphere decoding, see 
Hochwald and ten Brink (2003) and Vikalo, Hassibi, and Kailath (2004). A comparison of 
MCMC and sphere decoding MIMO detectors can be found in Zhu, Farhang-Boroujeny, 
and Chen (2005). 


20.2.4 Channel Estimation Methods 


As discussed in Chapter 17 and also earlier in this chapter, with regard to OFDM, channel 
estimation is usually performed through the use of a training sequence at the beginning of 
each packet or through training/pilot symbols that are inserted between data symbols. In 
a single-input single-output channel, when the channel is characterized by a flat gain, the 
minimum number of pilot symbols required to estimate the channel gain is 1. Additional 
pilot symbols may be used to average out the effect of channel noise and hence obtain a 
more accurate estimate of the channel. 

In a MIMO channel with flat gain, one has to estimate the channel gain matrix C. For 
this purpose, considering the channel equation (20.66), one may note that transmitting a 
single pilot vector, say, s, may not be sufficient to estimate all the N,N, elements of C. 
Here, one needs to transmit a pilot matrix S, of size N, x L, where L > N,. Each column 
of S, is a set of pilots transmitted from the N, transmit antennas. Combining the received 
signals from the L pilot symbol vectors, the channel equation relating S, with the matrix 
of received signal samples reads 


Y, =CS,+V (20.105) 


where the matrix V contains channel noise samples. In Eq. (20.105), both Y, and V are 
matrices of size N, x L. Assuming that S, is of rank N,, an estimate of C can be obtained 
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by multiplying both sides of Eq. (20.105) from right by SH (S,st)', to obtain 
a H H)y—! 
C= YS; (S Sp) 
H H)—! 
= C + VS, (8,5, ) (20.106) 


A convenient choice for S, is to be a square, that is, L = N,, and orthogonal matrix, that 
is, S55 = KI, where K is a constant, hence, 


A 1 H 
C = g YS 
i H 
=C+ VS; (20.107) 


A trivial choice of S, that satisfies the orthonormality condition SSY = KT is when 
S, = ~v KI. This corresponds to the case where at each instant of time a pilot symbol is 
transmitted from one of the antennas, and the rest of the antennas stay quiet. In that case, 
the channel estimate is trivially given by 


l Y 
VK? 
This method is often referred to as independent pilot pattern — independent pilots are 
transmitted from different antennas. The independent pilot patterns may be problematic 
in cases where the amplifiers at transmitters are power limited. Therefore, injecting all 
the transmit power into one the transmitter amplifiers limits the amount of power used 
for channel estimation, hence, reduces the quality of the estimated channel. 

An alternative pilot pattern, known as orthogonal pilot pattern, selects an S, in which 


all the elements are of the same level, and the orthogonality condition S S7 = KT holds. 
Examples of such designs for 2, 3, and 4 transmit antennas are, respectively, 


C= (20.108) 


e N=2 
[K}1 1 
p= Sfi 3l (20.109) 
e N.=3 
K 1 1 1 
Y,= > 1 ef"? gi? (20.110) 
3 1 e7j27/3 gi2n/3 
e N=4 
11 1 l 
K |1 —1 —1 1 
= rr owl (20.111) 
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20.3 MIMO-OFDM 


Extension of OFDM to MIMO channel is straightforward. Figure 20.15 present a block 
diagram of a MIMO-OFDM system. For brevity, modulators/demodulators to/from RF 
are not included. Separate OFDM symbol/signal sequences are synchronously transmitted 
from different transmit antennas. At the receiver, a set of OFDM demodulators separate 
the received signals into their respective subcarriers. This includes the removal of CP 
from each OFDM symbol and applying an FFT. Recalling the discussion on OFDM in 
Section 20.1, one may note that OFDM partitions the channel into a set of flat-fading 
MIMO channels expressed by the set of equations 


y(n) =C,s,(n) +v (n), ke K (20.112) 


where XK indicates the indices of the active subcarriers. 

All the methods related to MIMO system that were discussed in Section 20.2 are 
obviously applicable to the set of channel models (20.112) as well. In particular, the 
space-diversity and/or space-multiplexing methods can be applied to each subcarrier 
channel separately. Moreover, the channel estimation methods that were discussed in 
Section 20.2.4 can be readily extended to each subcarrier channel. In addition, the pilot 
spreading and channel estimation methods that were developed for OFDM systems can 
extended to MIMO-OFDM. A few relevant literature that interested readers may refer to 
are Ghosh et al. (2011), Chiueh, Tsai, and Lai (2012), and Stüber et al. (2004). 


Problems 


P20.1 Study and present your observation on the effect of cyclic prefix extension of x(n) 
on the individual tones embedded in x(n). Specifically, confirm that the cyclic 
prefix extension does not introduced and discontinuity in the time index of the 
tones. 


P20.2 


(i) Starting with the MATLAB script “OFDM. m,” available on the accompanying 
website, add the necessary line to form the autocorrelation coefficients K (n) 
and plot the result to confirm that you obtain a plot similar to the one in 
Figure 20.3. 

(ii) Provide the necessary arguments to prove that the time indices n,, ny, n3, 
and n4 of Figure 20.3 are related accordingly as 


(iii) Check to see these relationships are applicable to the plot you obtained in 
G). 

(iv) Do the observations made above change in presence of a carrier frequency 
offset? Examine the developed code and also provide theoretical explanation. 


P20.3 In the MATLAB script “OFDM. m,” available on the accompanying website, let the 
carrier frequency offset be zero (Dfe = 0) and the noise variance be set equal to 
10° (sigmav = 0.0001). Also, to begin assume that the channel is ideal (c = 1). 
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IFFT & 
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Figure 20.15 An MIMO-OFDM system. 
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P20.4 


P20.5 


P20.6 


P20.7 


P20.8 


(i) Complete the MATLAB script “OFDM.m’” to perform the tasks of timing 
recovery/acquisition, equalizers setting, and data detection. Here, for setting 
the equalizers coefficients, training symbol 7, can be used. Compare the 
recovered data symbols with the transmitted ones to confirm that your code 
works correctly. Also, by finding the difference between the recovered and 
transmitted data symbols obtain an estimate of the MSE at the receiver 
output. 

(ii) Repeat (i) when the training symbol T) is used to adjust the equalizers 
coefficients, and compare your results with those of (i). 

(iii) Repeat (i) when the average of the training symbols T, and T, is used to 
adjust the equalizers coefficients, and compare your results with those of (i). 

(iv) Repeat the above part when the ideal channel c = | is replaced by c= [1; 
zeros (47, 1); 0.5; zeros (28, 1); 0.8]. 


Repeat problem P20.3 for the case where the carrier frequency offset Dfc is 
nonzero. Not that, in this case, your code should first perform carrier recovery and 
correct the received signal “y” according to the estimated carrier offset, before 
proceeding with setting the equalizer coefficients and data detection. Examine 
your code for values of Dfc= 1, 2, and 10. 


Recall the time window function g(n) of Eq. (20.47). Also, recall from the results 
in Chapter 2 that if f(n) = g(n) x g(—n), then F(w) = |G(@)/’. 


(i) Show that 


N+N—|nl, for |n| <N + No 
0, otherwise 


fa) = 


(ii) Using the result of (i), show that F (œw) can be arranged to the form given 
on the right-hand side of Eq. (20.50). 


Develop and examine the necessary MATLAB codes to confirm the numerical 
results presented in “Numerical examples: Part 1 through 3” of Section 20.2.2. 


Show that the solution to the minimization of the cost function (20.95) is Eq. 
(20.97). 

Hint: Note that the cost function € may be divided into the independent cost 
functions & = E[|s,(n) — 5, (n)|*], for k = 1,2,..., Ne 


By invoking the matrix inversion lemma, show that Eq. (20.97) may be rearranged 
as Eq. (20.98). 
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minimum, 54, 58, 62, 67 
mean-squared error criterion, 50 
measurement noise, 68 
MIMO communication systems 730 
channel estimation methods in, 741 
channel model in, 732 
MIMO-OFDM, 744 
transmission techniques and detectors 
for space-multiplexing gain, 737 
transmission techniques for 
space-diversity gain, 732 
minimax theorem, 94, 164, 214, 219 
eigenanalysis of particular matrices, in, 
101, 116 
minimum mean-squared error, 53, 66, 
minimum mean-squared prediction error, 
357-8 
minimum mean-squared error criterion, 49 
minimum mean-squared error derivation 
direct, 52 
using the principle of orthogonality, 57 
minimum sum of error squares, 412 
misadjustment 
(see names of specific algorithms) 
modeling, 10-11, 125, 156, 463 
modem, 13 


modes of convergence 
(see names of specific algorithms) 
moving average (MA) model, 15 
multidelay fast block LMS, 267 
multipath communication channel, 480, 
587 
fade rate, 482 
multirate signal processing, 294, 607 
analysis filter bank, 295, 504, 683 
decimation, 296, 300, 590 
interpolation, 297, 590 
synthesis filter bank, 297, 298, 504 
subband and full-band signals, 294 
weighted overlap-add methods, 296-9 
(see also complementary filter banks; 
DFT filter banks; Low-delay 
analysis and synthesis filter 
banks; subband adaptive filters) 
multivariate random-walk process, 464 
mutually exclusive spectral bands, 
processes with, 211, 242, 294 


narrow-band adaptive filters 
(see adaptive line enhancement) 
narrowband sensor arrays, 660-78 
array topology and parameters, 660 
beamforming methods, 670 
direction of arrival estimation, 665 
signal subspace, noise subspace, and 
spectral factorization, 662 
narrow-band signals, 77, 514, 659 
Newton’s method/algorithm, 131-3, 215 
correction to the gradient vector, 132 
eigenvalues and, 133, 134 
eigenvectors and, 133, 134 
interpretation of, 132 
Karhunen-Loéve transform (KLT) and, 
133 
learning curve, 133 
mode of convergence, 133 
power normalization and, 134 
stability, 133 
whitening process in, 135 
noise cancelling, adaptive 
(see active noise control; noise 
cancellation) 
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noise cancellation, 74—9 
noise canceller set-up, 74 
primary and reference inputs, 74 
power inversion formula, 76 
noise enhancement, in equalizers, 12, 70, 
73, 593, 648 
non-negative definite correlation matrix, 
91 
non-stationary environment 
(see tracking) 
normalized correlation, 365 
normalized least-mean-square (NLMS), 
170-3 
algorithm, 174 
constrained optimization problem, as a, 
171 
derivation, 171 
geometrical interpretation of, 173 
Nitzberg’s interpretation of, 171, 198 
summary, 174 


observation vector, 90 
OFDM communication systems, 711-30 
carrier acquisition, 716 
Moose’s method, 716 
carrier tracking, 721—30 
blind methods, 723 
pilot-aided methods, 721 
channel estimation, 717 
Least MMSE channel estimator, 719 
ML channel estimator, 718 
channel-tracking methods, 730 
cyclic prefix, 713 
frequency domain equalization, 717 
MIMO-OFDM, 744 
packet structure, 714-16 
long training, 715, 716 
payload, 715 
short preamble, 715 
the principle, 711 
timing acquisition, 717 
Schmidl and Cox method, 717 
omni-directional antenna, 76, 165 
one-step forward prediction, 356 
optimum linear discrete-time filters 
(see linear prediction; Wiener filters) 
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order of N complexity transforms 
(see fast recursive least-squares 
algorithms; sliding transforms) 
order-update equations, 355, 362, 365, 
366, 370, 433 
orthogonal coefficient vectors, 224 
orthogonal complementary projection 
operator, 415 
orthogonal, random variables, 56 
orthogonality of backward prediction 
errors, 361 
orthogonal transforms, 208 
band-partitioning property of, 210, 228 
orthogonalization property of, 211 
orthogonality principle 
(see principle of orthogonality) 
orthonormal matrix, 475 
overlap-add method, 257, 296, 318 
overlap-save method, 257, 258 
matrix formulation of, 258 
oversampling, 344 


parallel processing, 251—2, 641 
parametric spectral analysis, 17, 542 
parametric modeling of random 
processes autoregressive (AR), 
15, 385 
(see also autoregressive modeling 
of random processes) 
autoregressive moving average 
(ARMA), 15 
moving average (MA), 15 
Parseval’s relation, 32, 97, 195, 225, 517, 
609 
partial correlation (PARCOR) coefficients, 
365, 437, 533 
partial response signaling, 13 
partitioned fast block LMS (PFBLMS), 
267-76 
algorithm, 273 
analysis, 269-71 
block diagrams, 269, 272 
computational complexity, 274-5 
example, 275 
computer simulations, 276-9 
constrained on rotational basis, 276 
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constrained versus unconstrained, 
269, 270 
frequency bin filters, 269 
learning curves, 278-9 
misadjustment equations, 274 
modified constrained PFBLMS 
algorithm, 276 
overlapping of partitions, 272 
performance function 
based on deterministic framework, 2 
based on statistical framework, 2 
canonical form, 106 
normalized form, 58 
unconstrained Wiener filter, of, 62 
performance indices, 220 
performance surface 
contour plots, 105 
definition, 89 
examples, 55, 108 
eccentricity, 107, 110, 217 
eigenvalue spread effect, 107 
extension to complex-valued case, 115 
hyperparabola shape, 106 
transversal Wiener filters, of, 104 
phase shift keying (PSK), 9, 58, 633, 636 
positive definite correlation matrix, 91 
power inversion formula, 76 
example of, 76-9 
power line interference cancellation, 18 
power normalization, 135, 390 
power spectral density, 39-41 
alternate forms, 374 
definition, 40 
estimation, 18, 384 
interpretation, 40 
properties, 40 
relationship with autocorrelation 
coefficients, 39, 374 
relationship with linear predictor 
coefficients, 374 
transmission of a stationary process 
through a linear filter, 41 
power spectrum 
(see power spectral density) 
prediction, linear 
(see linear prediction) 


prediction applications, 17-20, 355 
prediction-error filters, 359-60 
(see also linear prediction) 
prediction errors, properties of, 360-61 
prewindowing of input data, 441 
primary input, 21, 74 
principle of correlation cancellation, 69 
principle of orthogonality 
corollary to, 56 
complex-valued signals, for the case 
of, 60 
least-squares estimation, in, 412 
linear predictors, in, 361 
unconstrained Wiener filters, in, 64 
Wiener filters, in, 55 
processing delay (latency), 266, 318 
projection operator, 415 
prototype filter, 295, 312, 514, 542 
pulse-code modulation, adaptive 
differential, 20 
pulse-spreading effect, 12 
(see also intersymbol interference) 


QR-decomposition-based recursive 
least-squares (QRD-RLS), 8 

QRD-RLS algorithm, 8 

quadrature-amplitude modulation (QAM), 
9, 58, 179, 585, 711 

quasi LMS-Newton algorithm, 216 


raised-cosine pulse, 481, 586 
random processes 
(see stochastic processes) 
random variables inner product, 56 
orthogonality, 56 
projection, 56 
subspaces, 58 
random walk, 464 
approximate realization of, 482 
real DFT, 229 
non-recursive sliding realization of, 
238 
receiver noise, 12 
recursive algorithms 
(see names of specific algorithms) 
recursive least-squares (RLS) 


774 


recursive least-squares (RLS) (continued) 
algorithms, classification, 8 
QR-decomposition RLS, 8 
(see also standard recursive 
least-squares algorithm) 
Recursive least-squares estimation 
(see recursive least-squares 
algorithms; recursive 
least-squares lattice 
algorithms; fast transversal 
recursive least-squares 
algorithms) 
Recursive least-squares lattice (RLSL) 
algorithms, 8, 440 
notations and preliminaries, 440 
a priori and a posteriori estimation 
errors, 441 
augmented normal equations, 442 
computational complexity, 454 
conversion factor, 445 
update recursion, 446 
cross-correlations, 441 
update recursions, 447 
least-squares error sums, 441 
update recursions, 443 
numerical stability, 454 
prewindowing of input data, 441 
RLSL algorithm with error feedback, 
450 
summary, 451 
RLSL algorithm using a posteriori 
errors, 450 
summary, 452 
(see also least-squares lattice) 
reference input, 21, 74 
region of convergence, 28 
stable systems, of, 34 
regressor tap-weight vector, 422 
repeated eigenvalues, 91 
rescue variable, 458 
residue theorem, 32 
RLS algorithm 
(see standard recursive least-squares 
algorithms) 
robust beamforming, 683—92 
diagonal loading method, 688 
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methods based on sample matrix 
inversion, 690 
soft-constraint minimization, 686 
roll-off factor, 312, 481, 591 
rotation of the co-ordinate axes, 217 
rotation of the performance surface, 217 
round-off error, 233, 238, 420, 454 


self-tuning regulator (STR), 10 
sensor array processing 
narrowband sensor arrays, 660-78 
array topology and parameters, 660 
beamforming methods, 670 
direction of arrival estimation, 665 
signal subspace, noise subspace, and 
spectral factorization, 662 
broadband sensor arrays, 678-83 
beamforming methods 680 
steering, 679 
robust beamforming, 683—92 
diagonal loading method, 688 
methods based on sample matrix 
inversion, 690 
soft-constraint minimization, 686 
sign algorithm, 168 
signed-regressor algorithm, 168 
sign-sign algorithm, 168 
signal-to-noise power spectral density 
ratio, 68 
simplified LMS algorithms, 167—70 
convergence behavior, 169, 170 
computer simulations, 169 
sign algorithm, 168 
signed-regressor algorithm, 168 
sign-sign algorithm, 168 
sine wave plus noise, correlation matrix, 
100 
sinusoidal interference cancellation, 18, 
332 
sliding transforms, 230-42 
Bruun’s algorithm as a sliding DFT, 
235 
complexity comparisons, 242 
frequency sampling filters, 230 
common property of, 233 
transfer functions of, 232 
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recursive realization of, 231 
stabilization, 233 
stability, 233 
round-off error, 233, 238 
soft equalization, 631-43 
extrinsic information, 631 
interleaver/deinterleaver, 631 
iterative channel estimation and data 
detection, 641 
log-likelihood ratio (LLR), 631 
log-likelihood ratio (LLR), 
computation of, 638 
Markov chain Monte Carlo (MCMC), 
638 
parallel implementation of, 640 
soft MMSE equalizer, 633 
soft decoder, 631 
Gibbs sampler, 638 
Maroy chain Monte Carlo 
(MCMC), 638 
statistical soft equalizer, 635 
transceiver structure, 632 
turbo equalizer, 632 
soft MMSE equalizer, 633 
software implementation, 237, 318, 388 
source coders (see speech coding) 
spectral estimation, 15, 384, 542, 666 
parametric and non-parametric, 17 
spectrum 
(see power spectral density) 
spectrum analysis, 17, 384, 542, 666 
speech coding/processing, 18—20, 355 
voiced and unvoiced sounds, 19 
pitch period, 19 
linear predictive coding, 19 
linear prediction and, 355 
waveform coding, 19—20 
pulse code modulation (PCM), 19 
differential PCM (DPCM), 20 
adaptive DPCM (ADPCM), 20 
ITU recommendation G.726, 20 
ADPCM encoder-decoder, 20 
square-root of a matrix, 113 
square-root raised-cosine pulse, 586 
stationary processes 
(see stochastic processes) 


stability 


(see names of specific algorithms) 

standard recursive least-squares (RLS) 

algorithms, 8, 416-19 

a posteriori and a priori estimation 
errors, 418 

average tap-weight behavior, 422 

comparison with the LMS algorithm, 
426, 428 

computational complexity and alternate 
implementation of, 420 

computer simulations, 427 

convergence behavior, 422-30 

derivation of RLS recursions, 416-19 

effect of initialization on the steady 
state performance of, 431 

eigenvalue spread and, 428 

excess mean-squared error, 426 

fine-tuning process, 426 

forgetting factor, 416 

measure of memory, as a, 416 

gain vector, 417 

independence assumption, 424 

initialization of, 418 

learning curve, 423 

LMS-Newton algorithm and, 426, 465 

misadjustment, 426 

one iteration of, 418 

rank deficiency problem in, 428 

round-off error accumulation in, 421 

stable implementation of, 420 

summary, 419 

tap-weight misalignment, 431 

time constant, 425 

tracking behavior, 416, 471, 485 

transient behavior of, 427 

variable forgetting factor, with, 485 

summary, 487 

weight-error correlation matrix, 422 

weighting factor, 415 

(see also fast recursive least-squares 
algorithms) 


statistical soft equalizer, 635 


(see also soft equalization) 


steepest descent, method of, 120-31 


bounds on the step-size parameter, 123 
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steepest descent, method of, (continued) 
effect of eigenvalue spread, 130 
geometrical ratio factors, 130 
learning curve, 127 
numerical example, 128 
modes of convergence, 123, 125, 128 
optimum value of step-size parameter, 
131 
overdamped and underdamped, 123, 
124 
power spectral density effect, 131 
step-size parameter, 122 
search steps, 121 
stability, 123 
time constants, 128 
trajectories, 127 
transient behavior of tap-weight vector, 
125 
numerical example, 125-7 
transient behavior of mean-squared 
error, 125 
step-normalization, 214, 263 
stochastic gradient-based algorithms 
(see least-mean-square algorithm) 
stochastic gradient vector, 140, 176, 254, 
264, 325 
stochastic processes, 35—44 
ensemble averages, 35, 44, 148 
ergodicity, 44 
jointly stationary, 37 
mutually exclusive spectral bands, 
with, 211 
stationary in the strict sense, 35 
stationary in the wide sense, 35 
stochastic averages, 35-7 
z-transform representations, 37—8 
power spectral density, 38—40 
response of linear systems to, 41-4 
subband adaptive filters, 9, 294 
application to acoustic echo 
cancellation, 316-17 
comparison with the FBLMS algorithm 
318-19 
computational complexity, 307-8 
decimation factor and aliasing, 
308-10, 315 
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delay (latency), 303, 310, 316, 318 
misadjustment, 309 
selection of analysis and synthesis 
filters, 304 
slow convergence, problem of, 306 
stability, 304 
structures, 303 
synthesis independent, 303 
synthesis dependent, 303 
(see also low-delay analysis and 
synthesis filter banks) 
superposition, 2 
system function, 33-4 
system identification, 10 
system modeling, 10-11, 66-9, 156-8, 
323 


tap inputs, 3 
tap weights, 3 
tap-input vector, 50 
tapped-delay line filter 
(see transversal filter) 
tap-weight vector, 50 
tap-weight misalignment, 200, 431, 495, 
528 
tap weights perturbation, 151 
target response, 13, 343 
time and ensemble averages, 35, 44, 148 
time constants 
(see names of specific algorithms) 
trace, of a matrix, 94, 146 
tracking, 11, 463 
comparison of adaptive algorithms, 471 
convergence and, 463, 486 
formulation of, 464 
unified study, 465 
(see also generalized formulation of 
the LMS algorithm) 
independence assumption, 464 
multivariate random-walk process, 
464 
noise and lag misadjustments, 
468-9 
optimum step-size parameters for, 469 
process noise vector, 464 
training mode, 11, 13 
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transfer function 
backward prediction-error filters, 370 
definition, 2, 33 
forward prediction-error filters, 370 
IIR line enhancer, 332 
transform domain adaptive filters 
lattice predictors and, 368 
minimum mean-squared error, 209 
overview, 208 
Wiener-Hopf equation, 209 
(see also transform domain LMS 
algorithm) 
transform domain LMS algorithm, 
213-15 
comparison with the conventional 
LMS algorithm, 220 
comparisons among different 
transforms, 226 
computational complexity, 242 
effect of normalization, 217 
filtering view, 224 
geometrical interpretation of, 217 
guidelines for the selection of the 
transform, 228-9 
improvement factor, 220 
maximum attainable improvement, 222 
Karhunen-Loéve transform and, 215, 
220 
misadjustment of, 215 
modes of convergence, 215 
Newton’s algorithm and, 215 
power-normalization and, 213 
selection of transform, 226 
step-normalization, 214 
summary, 214 
tracking behavior, 465 
(see also discrete transforms; sliding 
transforms) 
transformation matrix, 208 
transversal filter based adaptive 
algorithms 
fast block LMS algorithm 
(see fast block LMS algorithm) 
fast recursive least-squares algorithms 
(see fast recursive least-squares 
algorithms) 


subband adaptive filters 
(see subband adaptive filters) 
transform domain LMS algorithm 
(see transform domain LMS 
algorithm) 
transversal filter, 4 


unconstrained Wiener filters, 61—80 
inverse modeling, 69 
modeling, 66 
noise cancellation, 74 
optimum transfer function of, 64 
performance function of, 62 
principle of orthogonality, 64 
Wiener-Hopf equation, 65 
unitary matrix, 93, 208 
unitary similarity transformation, 93 


variable step-size LMS algorithm, 177-9, 
474, 477-85 
bounds on the step-size parameters, 
179, 462 
computer simulations, 480-85 
derivation, 478 
optimal tracking behavior, 477-78, 
480 
step-normalization, 480 
step-size parameters, limiting, 482 
step-size update recursions, 178, 477 
summary, 179 
variations and extensions, 478—80 
common step-size parameter, with 
a, 479 
complex-valued data, for, 479 
multiplicative vs. linear increments, 
479 
sign update equation, 478 
vector space of random variables, 57 
Viterbi detector/algorithm, 13, 629 
Volterra filters, 6 


Walsh-Hadamard transform (WHT), 230 
sliding realization of, 249 

waveform coders, 19—20 

weak excitation, 131 

weight-error correlation matrix, 149, 422 
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weight-error vector, 142, 392, 422 
weighting function/factor, 98, 214, 410, 


relationship with least-squares 
estimation, 411 


552, 645 


wide-sense stationary processes 


(see stochastic processes) 


Wiener filters, 2, 48 


adaptive filters development based on 
theory of, 7 

applications, 48 

criterion, 49 

example of performance function for 
IIR structure, 64 

example of performance surface for 
FIR structure, 55 

extension to complex-valued case, 
58-61 

minimum mean-squared error, 53, 61 

non-recursive (FIR), 49 

optimum tap weights, 53, 
61 

performance/cost function, 49 

principle of correlation cancellation, 69 

principle of orthogonality, 55-7 

recursive (IIR), 49, 322 


summary, 79-80 
transversal, real-valued case, 50-5 
(see also unconstrained Wiener filters) 


Wiener-Hopf equation 


complex-valued case, 62 

direct derivation, 52 

derivation using the principle of 
orthogonality, 56 

frequency domain interpretation of, 65 

real-valued case, 53 

solution using Levinson-Durbin 
algorithm, 375 

transform domain adaptive filters, for, 
209 

unconstrained filters, for, 65 


window matrices, 258 


z-transform, 28—32 


examples, 28-31 

inverse, 29, 31 

inverse integral, 32 
region of convergence, 28 


